Abstract
In recent years, Open Educational Resources (OERs) were earmarked as critical
when mitigating the increasing need for education globally. Obviously, OERs
have high-potential to satisfy learners in many different circumstances, as
they are available in a wide range of contexts. However, the low-quality of OER
metadata, in general, is one of the main reasons behind the lack of
personalised services such as search and recommendation. As a result, the
applicability of OERs remains limited. Nevertheless, OER metadata about covered
topics (subjects) is essentially required by learners to build effective
learning pathways towards their individual learning objectives. Therefore, in
this paper, we report on a work in progress project proposing an OER topic
extraction approach, applying text mining techniques, to generate high-quality
OER metadata about topic distribution. This is done by: 1) collecting 123
lectures from Coursera and Khan Academy in the area of data science related
skills, 2) applying Latent Dirichlet Allocation (LDA) on the collected
resources in order to extract existing topics related to these skills, and 3)
defining topic distributions covered by a particular OER. To evaluate our
model, we used the data-set of educational resources from Youtube, and compared
our topic distribution results with their manually defined target topics with
the help of 3 experts in the area of data science. As a result, our model
extracted topics with 79% of F1-score.
Links and resources
Tags
community