On Republishing a Book for The Web

June 29th 2025

I recently republished Geodesy For The Layman for the web. It’s a book originating from the US government which covers the basics of geodesy—the study of the figure of the Earth. The last printing was in the mid ’80s, and all that’s available online are crappy scans.

In this post I will describe generally how convert a book to web format, and some of the challenges I faced in doing so.

Where to start if you know nothing at all

Skip this if you know a thing or two about the web.

If you want to publish books for the web, there’s a bit of a learning curve.

Web pages are documents written in a language called HTML. HTML code describes how text, images, headings, etc. are formatted and ordered with relation to each other. You will have to translate the text and formatting of your book to HTML code.

Heres an HTML snippet with a paragraph that has italicized text:

<p>
    Hello! this is an HTML example with <i>Italicized</i> text!
</p>

If you don’t know anything about web-dev (everyone starts at the beginning), here’s a four step plan to making things for the web.

Learn how to write and edit html. MDN has some tutorials that look helpful. HTMLDog seems more beginner friendly.
Learn how to put your things up publicly on the web. There’s a million ways to do this but Github Pages is good for beginners and skips a lot of the drudgery.
Learn CSS to change the look of your webpages. Again MDN has some tutorials and so does HTMLDog.
Learn the basics of typography and understand how to make text look nice. Practical Typography is great and can finished in an hour or two.

It takes a while to learn all this stuff so take it slow and do what you can, when you can.

Content

A few ways to get the text content of the book include:

OCR a scan of the text
Manually type the text
Find someone else who already transcribed it.

For Geodesy for the Layman, I found that the NOAA had transcribed the document into a ’90s-era webpage.

Of course, when copying off of someones else’s work make sure there’s no mistakes. No matter where you get the content from you’ll have to do some cleanup on it yourself by properly tagging formatting and such.

Structure

Two major methods for partitioning content in web books exist:

Single page. Put everything in one loooong webpage.
Multiple page. Put sections or chapters in different pages.

Single pages have the advantage that the reader can continue reading without having to change pages at each chapter. However, they are disadvantaged because all of the book must be downloaded before reading—slowing down page loads.

Multiple pages are quick to load as the content is delivered in chunks. Another advantage is that it is easier to link to individual sections, because each is a unique page.

For Geodesy for the Layman I chose to put it all in a single page because that was easier to edit. The next book I put on the web will probably be in multiple pages though.

The general layout for the markup I used is:

<!DOCTYPE html>
<html>
<head>
    <title>Document</title>
</head>
<body>
    <section id="intro">
        <h1>Book Title</h2>
    </section>
    <section id="chapter1">
        <h2>Chapter 1 ...</h2>
    </section>
    <section id="chapter2"> ... </section>
    ...
</body>
</html>

Each section has a bunch of paragraphs, some figures, and some headings. I chose to write all the HTML for my project by hand which allowed for a lot of flexibility with layout.

What should be automated?

I think three categories exist for what should be automated:

Always

Automate anything that is highly sequential.

In Geodesy for the Layman I automated the following steps to build the book:

Use Scour to reduce figure filesize
Minify HTML and CSS to reduce overall filesize
Clean any temporary files

Probably

Automate and abstract parts of the writing process.

The next book I publish online will probably more modular instead of being one large HTML glob. My general plan is to create markdown files for each chapter, and use a hacked together Python script to assemble it together.

Markdown has a few limitations which should be considered before using it however:

Lacking table support.
No support for description lists.
No differentiation between idiomatic <i> text and emphasized <em> text.
No support for figures with captions.
No support for setting class on elements (Suppose you wanted two list styles, one with roman numerals and one with arabic—can’t do it)

In general, I recommend using a markup language which is extensible and can have new features added to it. Some flavors of markdown are extensible, and I’ve heard of other extensible formats which are similar. Pollen seems very capable.

Maybe

Automate smaller details.

Some things which might benefit from automation:

Sequential figure numbering
Sequential equation numbering
Table of contents generation

All of these things are a bit fiddly though. Depending on your book, not all equations need to be numbered and some chapters might need to be excluded from the table.

I find small details tend to have more edge cases which makes automating them time consuming. Unless you have scores of figures, equations, or chapters, I wouldn’t bother automating these small details. One exception is if your book is still in a draft state. Automating the small details could be convenient if you plan to reorder things.

Text Styling

There’s a million choices to make with text, so I’ll just go over a few.

Generally, I tried to make Geodesy for the Layman look as much like a book as reasonable. Books have essentially perfected readability at this point. Books are good sources for typographical inspiration.

Font

Two primary categories of fonts exist: serif and sans-serif. Serif fonts have small details called serifs adorning each letter whereas sans-serifs do not.

example of serifs — Example of serifs (source CC-BY-SA 3.0 by user *Stannered*)

Some notes:

Body text in books is almost always set in a serif font.
Website text is set in either a serif or a sans-serif font, depending on the preference of the author.
Sometimes different font styles are used for different parts of the document. Often serif fonts are used for body text and sans-serifs are used for headings. This is seen in both books and webpages.
The body text of this website is set in a serif font, and the headings are set in a sans-serif font.

For Geodesy for the Layman, I chose the Alegreya font family because I prefer serif fonts and because this one is nice on the eyes.

Line Length

I think short line lengths (the width of the actual text) are easier to read. Generally I’ve found 55–67 characters length is comfortable. Very long line lengths are commonly seen on the web (Wikipedia is ~104 characters), so theres a bit of peer pressure to make your site similar. I advise that you don’t. Some bold fellows go the other direction and make websites with very short line lengths (~55 characters), but I feel like you have to be careful with that extreme too.

The CSS ch unit can be used to easily limit the width to a certain number of characters.

body {
    max-width: 58ch;
}

One problem shorter line lengths bring is fitting figures and images in. Wikipedia’s line lengths allow fairly large figures to be interspersed without choking the text down much. Textbooks also often have long line lengths—I imagine for the same reason.

The most elegant solution I’ve found for fitting figures in slim documents is to simply put the figures in the margins. Computer screens are more wide than they are tall and the text in a book is more tall than it is wide. The result is that webpage margins are left empty. Tuft CSS is a great example of using marginal space for figures.

For Geodesy for the Layman, I have three types of figure positioning:

Full width figure in body (example)
Floating figure entirely in margin (example)
Full width figure in body which expands into margin (example)

I think this style works pretty well with short line lengths.

Figures

Figures and images take time.

Before putting a book on the web, consider the time cost of figures. If you’re using scanned figures, it will take around ten minutes to touchup each figure. If you’re recreating the figures, it will take half to a whole hour to create each figure. Someone familiar with digital graphics could create figures faster, but it still costs time.

Do the math before starting. Forty figures at a rate of 10 minutes per figure totals to about 6.5 hours of work. I first started with the figures in Geodesy for the Layman by recreating them in Inkscape, but quickly found that was too time consuming and resorted to scanning the rest.

Removing Halftoning

It is difficult to remove Halftoning, especially CMYK halftoning. If you look very close at printed images, you will see small cyan, magenta, yellow, and black dots which combine to make colors. These can be annoying when scanned because they tend to create Moiré patterns.

Left: close-up of halftoning; Center: moiré pattern; Right: Cleaned image

The best way I found to remove these patterns is using Gimp with the G’MIC plugin:

Open high quality scan in gimp.
Decompose colored image into CMYK Layers (skip if image is grayscale)
Run G’MIC’s Descreen filter on the layers (You may have to do this twice)
Recompose CYMK channels into image (skip if image is grayscale)
Do standard image touchups.

You may be able to get a better result by manually tweaking the Frequency Domain for the image using ImageJ, but I think that’s a waste of time. I spent a while trying to make it work but gave up.

A lot of people online suggest blurring, then sharpening the image to remove halftoning. This works but I think it’s produces lousy results for anything other than portraits.

Optimizing Figures

Rule of thumb: don’t make figures too much bigger than they will be shown. Reducing image size reduces filesize which in term increases the page load speed.

Some tips:

SVG vector images can be shrunk greatly using Scour
GIFs are surprisingly efficient for grayscale images with few colors. Just make sure you reduce the color palette in them to something like 4 or 8 colors. Beyond ~20 colors, PNG seems to be more efficient than GIF. Gimp can reduce the color palette easily.
Make images lazy loaded. Lazily loaded images won’t be downloaded until they’re near view.

Math

A few options for rendering math on the web exist. Of course, you could just take a scan of math you wrote by hand or take a screenshot of some word processor math, but some better ways are available.

For the most part, math notation is written in a language called LaTeX (more specifically LaTeX’s math mode). Here are two examples:

$$
\sum_{n=0}^{\infty} \frac{x^n}{n^2}
$$
$$
\sin(x)^2 + \cos(x)^2 = 1
$$

Which display as:

\sum_{n = 0}^{\infty} \frac{x^{n}}{n^{2}}

\sin (x)^{2} + \cos (x)^{2} = 1

Web browsers can’t directly display LaTeX: a library must be used to render it. Here are some good ones

MathJax is the well-established solution and should look good anywhere. KaTex is also well-established, but I haven’t used it. Temml is fairly new. It converts LaTeX to MathML which the web browser can directly display. If you haven’t heard of it, I recommend checking out Temml.

For Geodesy for the Layman I used MathJax because I didn’t want to tinker, and I knew it would mostly just work. The math on my blog however is rendered with Temml.

But how can I make something like another website did?

Steal it

Really, press F12 and tinker with the Devtools. Figure out how that site implemented it and copy it.

Copyright

If you’re publishing anything online, you will want to be certain you have the right to do so. Copyright can be annoying. A certain work can be in the public domain, but someone’s scan of it might not. Be careful.

In my case, Geodesy for the Layman was written and published by the U.S. Federal government which generally means it’s in the public domain. Additionally, there’s a note inside the document disclaiming any copyright, which is nice.

If the work you’re publishing is under copyright, get the rights to republish it.

When you do publish, I recommend licensing your work under the Creative Commons (if you’re able). Specifically, I like the Creative Commons Attribution-ShareAlike license because under it:

Anyone can use or republish the work (as long as they properly provide credit)
If anyone publishes modifications to the work, they must also license those modifications under the same license (ensuring the work always stays free and available, even in derivative forms)

Disclaimer

I’m just a guy who likes the pursuit of making text look good. I’m certain there’s better advice than what I have given, but this is the best advice I have at this point.

If you see anything wrong in this post, or notice anything I left out, shoot me an email contact@alexanderbass.com. I always like to learn

On Republishing a Book for The Web

Where to start if you know nothing at all

Content

Structure

What should be automated?

Always

Probably

Maybe

Text Styling

Font

Line Length

Font Size

Figures

Removing Halftoning

Optimizing Figures

Math

But how can I make something like another website did?

Copyright

Disclaimer