Techreport,

Evaluation of Algorithm Performance on Identifying OA

, , , , and .
(2005)

Abstract

This is a second signal-detection analysis of the accuracy of a robot in detecting open access (OA) articles (by checking by hand how many of the articles the robot tagged OA were really OA, and vice versa). A first analysis, on a smaller sample (Biology: 100 OA, 100 non-OA), had found a detectability (d') of 2.45 and bias of 0.52 (hits 93%, false positives 16%; Biology %OA: 14%; OA citation advantage: 50%). The present analysis on a larger sample (Biology: 272 OA, 272 non-OA) found a detectability of 0.98 and bias of 0.78 (hits 77%, false positives, 41%; Biology %OA: 16%; OA citation advantage: 64%) An analysis in Sociology (177 OA, 177 non-OA) found near-chance detectability (d' = 0.11) and an OA bias of 0.99 (hits, 54%, false alarms, 49%; prior robot estimate Sociology %OA: 23%; present estimate 15%). It was not possible from these data to estimate the Sociology OA citation advantage. CONCLUSIONS: The robot significantly overcodes for OA. In Biology 2002, 40% of identified OA was in fact OA. In Sociology 2000, only 18% of identified OA was in fact OA. Missed OA was lower: 12% in Biology 2002 and 14% in Sociology 2000. The sources of the error are impossible to determine from the present data, since the algorithm did not capture URLs for documents identified as OA. In conclusion, the robot is not yet performing at a desirable level and future work may be needed to determine the causes, and improve the algorithm.

Tags

Users

  • @jsicot

Comments and Reviews