Nature 26 Oct 2021--Catalogue of billions of phrases from 107 million papers could ease computerized searching of the literature. Catalogue of billions of phrases from 107 million papers could ease computerized searching of the literature.
In a project that could unlock the world’s research papers for easier computerized analysis, an American technologist [Carl Malamud]has released online a gigantic index of the words and short phrases contained in more than 100 million journal articles — including many paywalled papers.
The catalogue, which was released on 7 October and is free to use, holds tables of more than 355 billion words and sentence fragments listed next to the articles in which they appear. It is an effort to help scientists use software to glean insights from published work even if they have no legal access to the underlying papers, says its creator, Carl Malamud. He released the files under the auspices of Public Resource, a non-profit corporation in Sebastopol, California that he founded.
Malamud says that because his index doesn’t contain the full text of articles, but only sentence snippets up to five words long, releasing it does not breach publishers' copyright restrictions on the re-use of paywalled articles. However, one legal expert says that publishers might question the legality of how Malamud created the index in the first place.
Nature, July 2019. -- A giant data store quietly being built in India could free vast swathes of science for computer analysis — but is it legal? A giant data store quietly being built in India could free vast swathes of science for computer analysis —but is it legal?
Over the past year, Malamud has — without asking publishers — teamed up with Indian researchers to build a gigantic store of text and images extracted from 73 million journal articles dating from 1847 up to the present day. The cache, which is still being created, will be kept on a 576-terabyte storage facility at Jawaharlal Nehru University (JNU) in New Delhi. “This is not every journal article ever written, but it’s a lot,” Malamud says. It’s comparable to the size of the core collection in the Web of Science database, for instance. Malamud and his JNU collaborator, bioinformatician Andrew Lynn, call their facility the JNU data depot.
A new article was published in The Chronicle about me running Sci-Hub, a project dedicated to providing free access to academic journals all over the world. Their goal is to present Sci-Hub and its author Alexandra Elbakyan as some kind of malign project.
Rick Anderson Scholarly Kitchen 3 juli 2019
"Librarians are not exactly presenting a unified voice in discouraging the use of pirated content. Some librarians do, of course, both try very hard to help their patrons understand the law and strongly encourage them to respect the legal rights of copyright holders, and in so doing actively discourage the use of pirate portals like Sci-Hub. Others, however — including prominent and influential figures in the profession — explicitly avoid taking such a stance. For example, Kevin Smith (Dean of Libraries at the University of Kansas) has publicly suggested that copyright infringement isn’t really morally problematic at all; Jeff Mackie-Mason (University Librarian and Professor of Information and Economics at the University of California) maintains that it’s “not for (the library) to say” how faculty and students should “behave vis à vis copyright laws.” To the extent that this attitude is widely shared among librarians, it suggests a major departure from past professional attitudes and practice. (My own institution, for example, is one of many that continue to employ a Scholarly Communication and Copyright Librarian, one of whose primary duties is to help students and faculty members understand copyright law and encourage them not only to assert their own rights as readers and researchers, but also to respect the rights of copyright holders.)"
By Joseph J. Esposito [ an independent management consultant providing strategic advice, operating analysis, and interim management in the area of digital media to both publishing and software companies.] "Having grown up in New Jersey, I have some qualms about what it means for anyone to form an alliance with unsavory characters. What do you do when they ask for a favor in return?
So it’s about time to consider what happens if the libraries win. By “win” I mean they refuse deals with publishers and turn their constituencies over to unauthorized sites. This will save them huge amounts of money, of course, money that they would surely like to put to other uses. Publishing is an ecosystem, however, and a significant change in one element can ripple across the entire field. If Sci-Hub becomes the default place to go for full-text content, what else will change?
"These musings were prompted by a tweet I saw a couple weeks ago:
What is Sci-Hub’s preservation policy?
Twitter being Twitter, I have no way of knowing the context of that remark. Was it sarcastic? “Now that Sci-Hub is becoming the go-to place to access content, are you going to tell me that those crooks give a damn about the preservation of the scholarly record?” Or was it doe-eyed and innocent? “I would be interested to learn more about Sci-Hub’s preservation policies now that we use it for access.” On the other hand, if we were to be told about Sci-Hub’s preservation policy (Twitter being Twitter), it would be fake news."
In the long discussion Sandy Thatcher said: "One conundrum that libraries who tacitly rely on Sci-Hub create for faculty is to implicate them in associating with an ethically unsavory organization."
By Joh Bohannan 2016
An exclusive look at data from the controversial web site Sci-Hub reveals that the whole world, both poor and rich, is reading pirated research papers.