
On Formally Unconvertable Document Formats (Toward a Formalism for Communication On the Web)


My experience with document interchange led me to classify document formats using the essential distinction that some are "programmable" and some are not. [..]

The reason that this distinction is essential with respect to document interchange is that extracting information from documents in "programmable" document formats is equivalent to the halting problem. That is, it is arbitrarily difficult and cannot be automated in a general fashion.

For example, I conjecture that it is impossible to write a program that will extract the third word from a TeX document.




  • @schmidt2

Comments and Reviews