2 The basics of TEX4ht

2.1 TEX4ht in general.

We will here talk about TEX4ht in general terms. For reference, I strongly recommend reading “The LATEX Web Companion” [1], as the documentation in this book is very informative. It is a very handy book on other matters as well. See also the official web site [2] for TEX4ht. However, we will touch the important basics here, and everything needed to get a working knowledge of the system will be covered.

There are several other alternatives besides TEX4ht for creating HTML out of LATEX, but TEX4ht is definitely the most elaborate and gives the best results after some work. The main drawback is this amount of work. On the other hand, the benefits are much more important: Because TEX4ht works on a low level (TEX), and LATEX works on top of this, complex structures are easily handled and may be tailored to get the results one want. To compare, LaTeX2HTML is merely a Perl script parsing LATEX code, and staggers when the author does “tricks,” as is the case in all projects bigger than your average “hello world!”-project.1

How does TEX4ht work? It is basically a combination of a LATEX package and a set of post-processing utilities. The LATEX package modifies the dvi output2 from the compilation process, adding information for the post-processing utilities to digest. These utilities generate the HTML code and the pictorial representation of content such as included figures and mathematical formulae. We will have more to say on this later.

2.2 How to invoke TEX4ht

To invoke TEX4ht inside the LATEX document, one simply issues a \usepackage{tex4ht} command in the preamble immediately after the \documentclass statement. (There are several optional arguments, but we won’t worry about them now.) This makes the dvi-output all messed up, probably making it incompatible with your favorite dvi-driver (eg. dvips). This is because the dvi-file now is tailored to create HTML files (making the dvi device dependent). This is done with the external post-processing commands tex4ht and t4ht.

The tex4ht command takes the dvi and some other input files created by the TEX4ht package and creates html-files and an idv-file. Try looking at this with e.g. xdvi, and you’ll discover that it really is a dvi: Each page holds one item of some kind. These are items that TEX4ht cannot convert into HTML directly. So, the program t4ht is used to create bitmap versions of each item. This process relies on external utilities, as explained below. Thus, t4ht might seem rather fragile, but on the other hand it is highly customizable. Take a look at figure 1 as this illustrates the concept.


PICT

Figure 1: Flow of files and information when using TEX4ht

Note:
If you want to create an ordinary non-HTML dvi after you have used TEX4ht (that is, if you remove the proper \usepackage command), you must delete the aux-file created when you were using TEX4ht.

The actual commands to use when creating your HTML version is:

latex document  
latex document  
latex document  
tex4ht document  
t4ht document
Note the several invocations of latex. This is because TEX4ht needs to get HTML tables, cross references and so on correct. Sometimes even more runs of latex are needed! The above process can be run with the single command ht latex document, and is usually all right for a start. If your document is very complex and your HTML-document turns out to lack some references for instance, then maybe you need one or two more latex runs before post-processing with tex4ht. And if you are using for example BibTEX, you almost certainly need to make your own sequence of commands.

2.3 Hello, world!

Let us see a small example using TEX4ht. This shows the default behaviour of the system, and is not particulary convincing. Let this be the input LATEX document (hello.tex):

\documentclass{article}  
\usepackage{tex4ht}  
 
\begin{document}  
\section{Hello, world!}  
This is a simple \LaTeX{} document.  
\section{Ars magna}  
This is easy:  
\[ a^2 + b^2 = c^2 \]  
But what about  
\[ \alpha^3 + \beta^3 = \gamma^3 \, ? \]  
\end{document}
Run ht latex hello and watch. Note how slow the process of generating bitmaps from the mathematical formulae is and how bad the maths look when hello.html is viewed with for example Mozilla. (The actual code is rather complicated, so we won’t display it here.) Apart from this, the document is quite nice. But with some simple adjustments to the configuration, we will be able to produce both quicker and better results. This is the topic of section 3.1.

2.4 Parameters to the tex4ht package

When our recipe for creating web documents is finished, we won’t need to worry about the parameters for the tex4ht package. Let us mention some of the parameters anyway for completeness:

If parameters are supplied at all, the first one must either be the name of a configuration file (see [1]) or html. We will always use the latter. A common mistake is to forget that the first paramteter to tex4ht is special.3

Other parameters include: