There are several other alternatives besides TEX4ht for creating HTML out of LATEX, but TEX4ht is definitely the most elaborate and gives the best results after some work. The main drawback is this amount of work. On the other hand, the benefits are much more important: Because TEX4ht works on a low level (TEX), and LATEX works on top of this, complex structures are easily handled and may be tailored to get the results one want. To compare, LaTeX2HTML is merely a Perl script parsing LATEX code, and staggers when the author does “tricks,” as is the case in all projects bigger than your average “hello world!”-project.1
How does TEX4ht work? It is basically a combination of a LATEX package and a set of post-processing utilities. The LATEX package modifies the dvi output2 from the compilation process, adding information for the post-processing utilities to digest. These utilities generate the HTML code and the pictorial representation of content such as included figures and mathematical formulae. We will have more to say on this later.
To invoke TEX4ht inside the LATEX document, one simply issues a \usepackage{tex4ht} command in the preamble immediately after the \documentclass statement. (There are several optional arguments, but we won’t worry about them now.) This makes the dvi-output all messed up, probably making it incompatible with your favorite dvi-driver (eg. dvips). This is because the dvi-file now is tailored to create HTML files (making the dvi device dependent). This is done with the external post-processing commands tex4ht and t4ht.
The tex4ht command takes the dvi and some other input files created by the TEX4ht package and creates html-files and an idv-file. Try looking at this with e.g. xdvi, and you’ll discover that it really is a dvi: Each page holds one item of some kind. These are items that TEX4ht cannot convert into HTML directly. So, the program t4ht is used to create bitmap versions of each item. This process relies on external utilities, as explained below. Thus, t4ht might seem rather fragile, but on the other hand it is highly customizable. Take a look at figure 1 as this illustrates the concept.
The actual commands to use when creating your HTML version is:
latex document
latex document latex document tex4ht document t4ht document |
Let us see a small example using TEX4ht. This shows the default behaviour of the system, and is not particulary convincing. Let this be the input LATEX document (hello.tex):
\documentclass{article}
\usepackage{tex4ht} \begin{document} \section{Hello, world!} This is a simple \LaTeX{} document. \section{Ars magna} This is easy: \[ a^2 + b^2 = c^2 \] But what about \[ \alpha^3 + \beta^3 = \gamma^3 \, ? \] \end{document} |
When our recipe for creating web documents is finished, we won’t need to worry about the parameters for the tex4ht package. Let us mention some of the parameters anyway for completeness:
If parameters are supplied at all, the first one must either be the name of a configuration file (see [1]) or html. We will always use the latter. A common mistake is to forget that the first paramteter to tex4ht is special.3
Other parameters include: