Know your word boundaries in HTML

Coding in HTMLCoding in HTML
Coding in HTML

Today I noticed something peculiar about the rendering of my blog website. In the previous version of the website, from a simplistic perspective, the markup for the tags (e.g., at the end of each entry) is as follows:

<div><span class="span1">Mathematics</span><span class="span1">Computer Science</span><span class="span1">Programming</span></div>

The div element is a container for a list of tags, and each span element is a tag. Instead of using space characters, I decided use CSS to insert whitespace between the tags. It is very convenient to change the spacing conditioned on @media queries. The CSS roughly looks as follows:

.span1 { margin-right: 20px; }

This creates a problem for the line-breaking algorithm. I added more styling and manually controlled the width of the container div to exhibit the problem:

I also added another div element, corresponding to the essence of the new version of the website, to show what is expected. As you can see, in the first red box, a line-break is inserted inside the ‘Computer Science’ tag, even the width is sufficient to render the first two tags. (This is reproducible in EdgeHTML 18.18363 and Chrome 80.0.3987.100.)

Why? The reason* is that those consecutive span elements have display: inline. For consecutive inline elements, the browser will concatenate their innerText properties and only consider breakable white-space characters as word boundaries.

Since there are no space characters between those span elements, the browser thinks it is rendering MathmaticsComputer ScienceProgramming. It sees that the width is insufficient for rendering the whole string, so it happily breaks the line at the ‘only’ word boundary. (However deceptive it may seem, there are no word boundaries between the tags.)

Why is this the desired behaviour? Consider ‘CS means Computer Science’. You would expect the browser not break after the bold-faced C or S. This explains why inter-element positions for two inline elements should not be breakable by default.

How to fix it? There are many ways to fix the problem. For example, you could use the ::after pseudo-element to be display: inline-block and have the desired width. An inline-block element creates a word boundary. Or you could use it to place a space character, which also creates a word boundary.

Pasting the example in OneNote and Word
Pasting the example in OneNote and Word

My solution is to insert span elements to provide space characters, which are also styled to provide the correct amount of whitespace. This has the extra advantage of being clipboard-friendly (copyable). If you try copying the above example to Word or OneNote, the first box might not give you the inserted space, and the three blue boxes are redacted into one in Word. The second box, in contrast, will definitely give you a space character between the tags. The solution is roughly as follows:

<div><span class="span2">Mathematics</span><span class="span3"> </span><span class="span2">Computer Science</span><span class="span3"> </span><span class="span2">Programming</span></div>
.span3 { margin-left: 8px; margin-right: 8px; }

Due to the space characters in the inserted span elements, the browser thinks it is rendering Mathematics Computer Science Programming, and chooses to break the line between ‘Science’ and ‘Programming’ at that width.

* I did not look up the documentation. This is just my (un)educated guess. Yet it is a good theory that explains everything I have observed.
Nit-picker’s corner: …whose contents are Latin script and without other styling controlling the word-wrap behaviour.
Note that here a line-break is allowed in the middle of a tag. This is a matter of choice and out of the technical scope of this article. Setting display: inline-block for the tag span elements will disallow line-breaks within a tag and create word boundaries between the tags, but it would not give a real space character when copied. The current version of the website takes one step further, using vertical bars (instead of added whitespace) to separate the tags, which is clearer.

Please enable JavaScript to view the comments powered by Disqus.