Alexander Bass

On Republishing a Book for The Web

I recently republished Geodesy For The Layman for the web. It’s a book originating from the US government which covers the basics of geodesy—the study of the figure of the Earth. The last printing was in the mid ’80s, and all that’s available online are crappy scans.

In this post I will describe generally how convert a book to web format, and some of the challenges I faced in doing so.

Where to start if you know nothing at all

Skip this if you know a thing or two about the web.

If you want to publish books for the web, there’s a bit of a learning curve.

Web pages are documents written in a language called HTML. HTML code describes how text, images, headings, etc. are formatted and ordered with relation to each other. You will have to translate the text and formatting of your book to HTML code.

Heres an HTML snippet with a paragraph that has italicized text:

<p>
    Hello! this is an HTML example with <i>Italicized</i> text!
</p>

If you don’t know anything about web-dev (everyone starts at the beginning), here’s a four step plan to making things for the web.

  1. Learn how to write and edit html. MDN has some tutorials that look helpful. HTMLDog seems more beginner friendly.
  2. Learn how to put your things up publicly on the web. There’s a million ways to do this but Github Pages is good for beginners and skips a lot of the drudgery.
  3. Learn CSS to change the look of your webpages. Again MDN has some tutorials and so does HTMLDog.
  4. Learn the basics of typography and understand how to make text look nice. Practical Typography is great and can finished in an hour or two.

It takes a while to learn all this stuff so take it slow and do what you can, when you can.

Content

A few ways to get the text content of the book include:

For Geodesy for the Layman, I found that the NOAA had transcribed the document into a ’90s-era webpage.

Of course, when copying off of someones else’s work make sure there’s no mistakes. No matter where you get the content from you’ll have to do some cleanup on it yourself by properly tagging formatting and such.

Structure

Two major methods for partitioning content in web books exist:

Single pages have the advantage that the reader can continue reading without having to change pages at each chapter. However, they are disadvantaged because all of the book must be downloaded before reading—slowing down page loads.

Multiple pages are quick to load as the content is delivered in chunks. Another advantage is that it is easier to link to individual sections, because each is a unique page.

For Geodesy for the Layman I chose to put it all in a single page because that was easier to edit. The next book I put on the web will probably be in multiple pages though.

The general layout for the markup I used is:

<!DOCTYPE html>
<html>
<head>
    <title>Document</title>
</head>
<body>
    <section id="intro">
        <h1>Book Title</h2>
    </section>
    <section id="chapter1">
        <h2>Chapter 1 ...</h2>
    </section>
    <section id="chapter2"> ... </section>
    ...
</body>
</html>

Each section has a bunch of paragraphs, some figures, and some headings. I chose to write all the HTML for my project by hand which allowed for a lot of flexibility with layout.

What should be automated?

I think three categories exist for what should be automated:

Always

Automate anything that is highly sequential.

In Geodesy for the Layman I automated the following steps to build the book:

Probably

Automate and abstract parts of the writing process.

The next book I publish online will probably more modular instead of being one large HTML glob. My general plan is to create markdown files for each chapter, and use a hacked together Python script to assemble it together.

Markdown has a few limitations which should be considered before using it however:

In general, I recommend using a markup language which is extensible and can have new features added to it. Some flavors of markdown are extensible, and I’ve heard of other extensible formats which are similar. Pollen seems very capable.

Maybe

Automate smaller details.

Some things which might benefit from automation:

All of these things are a bit fiddly though. Depending on your book, not all equations need to be numbered and some chapters might need to be excluded from the table.

I find small details tend to have more edge cases which makes automating them time consuming. Unless you have scores of figures, equations, or chapters, I wouldn’t bother automating these small details. One exception is if your book is still in a draft state. Automating the small details could be convenient if you plan to reorder things.

Text Styling

There’s a million choices to make with text, so I’ll just go over a few.

Generally, I tried to make Geodesy for the Layman look as much like a book as reasonable. Books have essentially perfected readability at this point. Books are good sources for typographical inspiration.

Font

Two primary categories of fonts exist: serif and sans-serif. Serif fonts have small details called serifs adorning each letter whereas sans-serifs do not.

example of serifs
Example of serifs (source CC-BY-SA 3.0 by user Stannered)

Some notes:

For Geodesy for the Layman, I chose the Alegreya font family because I prefer serif fonts and because this one is nice on the eyes.

Line Length

I think short line lengths (the width of the actual text) are easier to read. Generally I’ve found 55–67 characters length is comfortable. Very long line lengths are commonly seen on the web (Wikipedia is ~104 characters), so theres a bit of peer pressure to make your site similar. I advise that you don’t. Some bold fellows go the other direction and make websites with very short line lengths (~55 characters), but I feel like you have to be careful with that extreme too.

One problem shorter line lengths bring is fitting figures and images in. Wikipedia’s line lengths allow fairly large figures to be interspersed without choking the text down much. Textbooks also often have long line lengths—I imagine for the same reason.

The most elegant solution I’ve found for fitting figures in slim documents is to simply put the figures in the margins. Computer screens are more wide than they are tall and the text in a book is more tall than it is wide. The result is that webpage margins are left empty. Tuft CSS is a great example of using marginal space for figures.

For Geodesy for the Layman, I have three types of figure positioning:

I think this style works pretty well with short line lengths.

Font Size

Too many websites have small text. Choose a font size somewhere between 17px and 21px and call it a day.

Personally, I like to size everything on websites using rems like so:

/* Set root font size */
html {font-size: 20px;}
/* set the paragraph size to 1x root size */
p {font-size:1rem;}
/* set the 1st heading size to 1.7x root size */
h1 {font-size:1.7rem;}
/* etc. */

this avoids the issue with ems where if you have a list styled to be 2× as big as its parent

ul {font-size: 2em;}

For each level of nesting of the list, the font size will double

<ul> <!-- 2× size -->
    <li>
        <ul> <!-- 4× size -->
            <li>
                <ul></ul> <!-- 8x size; etc. -->
            </li>
        </ul>
    </li>
</ul>

Figures

Figures and images take time.

Before putting a book on the web, consider the time cost of figures. If you’re using scanned figures, it will take around ten minutes to touchup each figure. If you’re recreating the figures, it will take half to a whole hour to create each figure. Someone familiar with digital graphics could create figures faster, but it still costs time.

Do the math before starting. Forty figures at a rate of 10 minutes per figure totals to about 6.5 hours of work. I first started with the figures in Geodesy for the Layman by recreating them in Inkscape, but quickly found that was too time consuming and resorted to scanning the rest.

Removing Halftoning

It is difficult to remove Halftoning, especially CMYK halftoning. If you look very close at printed images, you will see small cyan, magenta, yellow, and black dots which combine to make colors. These can be annoying when scanned because they tend to create Moiré patterns.

Left: close-up of halftoning; Center: moiré pattern; Right: Cleaned image

The best way I found to remove these patterns is using Gimp with the G’MIC plugin:

  1. Open high quality scan in gimp.
  2. Decompose colored image into CMYK Layers (skip if image is grayscale)
  3. Run G’MIC’s Descreen filter on the layers (You may have to do this twice)
  4. Recompose CYMK channels into image (skip if image is grayscale)
  5. Do standard image touchups.

You may be able to get a better result by manually tweaking the Frequency Domain for the image using ImageJ, but I think that’s a waste of time. I spent a while trying to make it work but gave up.

A lot of people online suggest blurring, then sharpening the image to remove halftoning. This works but I think it’s produces lousy results for anything other than portraits.

Optimizing Figures

Rule of thumb: don’t make figures too much bigger than they will be shown. Reducing image size reduces filesize which in term increases the page load speed.

Some tips:

Math

A few options for rendering math on the web exist. Of course, you could just take a scan of math you wrote by hand or take a screenshot of some word processor math, but some better ways are available.

For the most part, math notation is written in a language called LaTeX (more specifically LaTeX’s math mode). Here are two examples:

$$
\sum_{n=0}^{\infty} \frac{x^n}{n^2}
$$
$$
\sin(x)^2 + \cos(x)^2 = 1
$$

Which display as:

n=0xnn2 sin(x)2+cos(x)2=1

Web browsers can’t directly display LaTeX: a library must be used to render it. Here are some good ones

MathJax is the well-established solution and should look good anywhere. KaTex is also well-established, but I haven’t used it. Temml is fairly new. It converts LaTeX to MathML which the web browser can directly display. If you haven’t heard of it, I recommend checking out Temml.

For Geodesy for the Layman I used MathJax because I didn’t want to tinker, and I knew it would mostly just work. The math on my blog however is rendered with Temml.

But how can I make something like another website did?

Steal it

Really, press F12 and tinker with the Devtools. Figure out how that site implemented it and copy it.

If you’re publishing anything online, you will want to be certain you have the right to do so. Copyright can be annoying. A certain work can be in the public domain, but someone’s scan of it might not. Be careful.

In my case, Geodesy for the Layman was written and published by the U.S. Federal government which generally means it’s in the public domain. Additionally, there’s a note inside the document disclaiming any copyright, which is nice.

If the work you’re publishing is under copyright, get the rights to republish it.

When you do publish, I recommend licensing your work under the Creative Commons (if you’re able). Specifically, I like the Creative Commons Attribution-ShareAlike license because under it:

Disclaimer

I’m just a guy who likes the pursuit of making text look good. I’m certain there’s better advice than what I have given, but this is the best advice I have at this point.

If you see anything wrong in this post, or notice anything I left out, shoot me an email contact@alexanderbass.com. I always like to learn