Abstract
The idea of using the Web as a corpus for linguistic research is getting increasingly popular. Most often this means using Web search engine page hit counts as estimates for n-gram frequencies. While the results so far have been very encouraging, some researchers worry about what appears to be the instability of these estimates. Using a particular NLP task, we compare the variability in the n-gram counts cross different search engines as well as for the same search engine across time, finding that although there are measurable differences, they are not statistically significantly different for the task examined.
Users
Please
log in to take part in the discussion (add own reviews or comments).