Abstract
•Purpose
The goal of the research is to explore whether the use of higher-level semantic features can help us build better SOM representation as measured from a human-centered perspective. We also explore an automatic evaluation method that utilizes human expert knowledge encapsulated in the structure of traditional textbooks to determine map representation quality.
•Design/methodology/approach
Two types of document representations involving semantic features have been explored: 1) using only one individual semantic feature, and 2) combining a semantic feature with keywords. A set of experiments were conducted to investigate the impact of semantic representation quality on the map. The experiments were performed on data collections that included single book corpus and multiple book corpus.
•Findings
Combining keywords with certain semantic features achieves significant improvement of representation quality over the keywords-only approach in a relatively homogeneous single book corpus. Changing the ratios of the combined different features also affects the performance.
While semantic mixtures can work well in single book corpus, they lose their increased effectiveness over keywords in the multiple-book corpus. This raises a concern about whether the semantic representations in the multiple book corpus are homogeneous and coherent enough to apply semantic features. The terminology issue among textbooks negatively impacts the ability of the SOM to generate a high quality map for heterogeneous collections.
•Originality/value
We explored the use of higher-level document representation features for the development of better-quality SOM. In addition, we piloted a specific method for evaluating the SOM quality based on the organization of information content in the map.
Users
Please
log in to take part in the discussion (add own reviews or comments).