The OCR4all tool ensures converting historical printings into computer-readable texts. It is very reliable, user-friendly, and open source. It was developed by scientists at the University of Würzburg.
We use Text Mining, Deep Learning and Big Data Analytics to unleash the potential of unstructured data and to integrate unused assets into decision-making processes.
In this post, I want to show how I use NLTK for preprocessing and tokenization, but then apply machine learning techniques (e.g. building a linear SVM using stochastic gradient descent) using Scikit-Learn.
@startuml
participant User
User -> A: DoWork
activate A #FFBBBB
A -> A: Internal call
activate A #DarkSalmon
A -> B: << createRequest >>
activate B
B --> A: RequestCreated
deactivate B
deactivate A
A -> User: Done
deactivate A
@enduml
K. Guhlemann, и C. Best. Arbeit und Altern: Eine Bilanz nach 20 Jahren Forschung und Praxis, Nomos Verlagsgesellschaft, Baden-Baden, (Mikrozensus).(2021)
A. Putzier. Slawische Sprachen unterrichten : Sprachübergreifend, grenzüberschreitend, interkulturell, Peter Lang GmbH, Internationaler Verlag der Wissenschaften, Berlin, (Eurobarometer).(2021)