bookmark

On Formally Unconvertable Document Formats (Toward a Formalism for Communication On the Web)


Description

My experience with document interchange led me to classify document formats using the essential distinction that some are "programmable" and some are not. [..]

The reason that this distinction is essential with respect to document interchange is that extracting information from documents in "programmable" document formats is equivalent to the halting problem. That is, it is arbitrarily difficult and cannot be automated in a general fashion.

For example, I conjecture that it is impossible to write a program that will extract the third word from a TeX document.

Preview

Tags

Users

  • @schmidt2

Comments and Reviews