My experience with document interchange led me to classify document formats using the essential distinction that some are "programmable" and some are not. [..]
The reason that this distinction is essential with respect to document interchange is that extracting information from documents in "programmable" document formats is equivalent to the halting problem. That is, it is arbitrarily difficult and cannot be automated in a general fashion.
For example, I conjecture that it is impossible to write a program that will extract the third word from a TeX document.
Computer Science in the 1960s to 80s spent a lot of effort making languages which were as powerful as possible. Nowadays we have to appreciate the reasons for picking not the most powerful solution but the least powerful. The reason for this is that the less powerful the language, the more you can do with the data stored in that language. If you write it in a simple declarative from, anyone can write a program to analyze it in many ways. The Semantic Web is an attempt, largely, to map large quantities of existing data onto a common language so that the data can be analyzed in ways never dreamed of by its creators.
M. Bechny, F. Sobieczky, J. Zeindl, and L. Ehrlinger. Proceedings of the 33rd International Conference on Scientific and Statistical Database Management, page 214–219. New York, NY, USA, Association for Computing Machinery, (Aug 11, 2021)
M. Bechny, F. Sobieczky, J. Zeindl, and L. Ehrlinger. Proceedings of the 33rd International Conference on Scientific and Statistical Database Management, page 214–219. New York, NY, USA, Association for Computing Machinery, (Aug 11, 2021)
S. Warburton, and Y. Mor. EuroPLoP'22: 27th European Conference on Pattern Languages of Programs, New York, NY, United States, Association for Computing Machinery, (2022)
H. Agt-Rickauer, C. Hentschel, and H. Sack. Proceedings of the Posters and Demos Track of the 14th International Conference on Semantic Systems co-located with the 14th International Conference on Semantic Systems (SEMANTiCS 2018), Vienna, Austria, September 10-13, 2018, volume 2198 of CEUR Workshop Proceedings, CEUR-WS.org, (2018)