This handbook aims to support higher education institutions with the integration of FAIR-related content in their curricula and teaching. It was written and edited by a group of about 40 collaborators in a series of six book sprint events that took place between 1 and 10 June 2021. The document provides practical material, such as competence profiles, learning outcomes and lesson plans, and supporting information. It incorporates community feedback received during the public consultation which ran from 27 July to 12 September 2021.
A guide to principles and methods for the management, archiving, sharing, and citing of linguistic research data, especially digital data. “Doing language scien
December 16, 2021
Professor Lesley Gourlay, University College London, Dr Carlo Perrotta, Monash University, Professor Paul Prinsloo, University of South Africa Chair: Dr Ibrar Bhatt, Queen's University Belfast
The RDM Maturity Assessment Model in Canada (MAMIC) is based on the RISE and SPARC assessment models and has been adapted to fit the Canadian institutional context. This tool is designed to help evaluate the current state of institutional RDM services and supports as part of an institutional RDM strategy development process. It focuses on four areas of service and support - Institutional Policies and Processes, IT Infrastructure, Support Services, and Financial Support - and allows users to assess the maturity and scale of these services.
This template is intended to assist research institutions in developing an institutional research data management (RDM) strategy, both to fulfil the first requirement of the Tri-Agency Research Data Management Policy and to articulate their commitment to RDM at the institutional level. It consists of suggested activities and processes in five stages to inform and shape the creation of an RDM strategy that meets local needs and resource capacities. Crucially, it is intended as a process, rather than a product template -- it provides steps for how to develop an institutional strategy, not a template outlining what an institutional strategy document itself looks like. In fact, these processes should be seen as ongoing to inform strategy updates over time and to help align institutional RDM efforts with broader institutional goals, objectives, policies, and services. While it is recommended that institutions employ each of the strategy development activities included in this template, your institution may choose to engage in each activity at a level of depth and detail appropriate to its size, research intensity, and existing RDM capacity. The institutions which will be required to create RDM strategies are postsecondary institutions and research hospitals eligible to administer Tri-Agency funds. See both the Tri-Agency RDM Policy and Statement of Principles on Digital Data Management, which outline expectations and responsibilities for RDM in the academic community. For definitions of RDM terms in this document, please refer to the CASRAI Research Data Management Glossary.
As part of the Heritage Connector project we've built a knowledge graph from the Science Museum Group and V&A collections using machine learning techniques.
This is an experimental interface designed to let you explore the connections in this knowledge graph, in a way that feels familiar.
This guide provides step-by-step instructions for curating new datasets deposited in Dataverse. The guide is framed around the acronym CURATION to provide an easy reminder for curators, especially those starting out, of the main steps in the curation process. This framework is adapted from the Data Curation Network’s CURATED steps for use in a bilingual context.
One of the most useful features of the Dataverse repository software is the large number of metadata fields it provides for describing research data. This guide is intended to support both the novice and experienced user in creating metadata for datasets in a Dataverse repository. It provides official definitions of metadata fields with clarifications and tips, distinguishes between required, recommended, and optional fields, and illustrates the use of fields with examples. This version of the guide has been updated to include coverage of all available metadata fields - citation, geospatial, social science and humanities, astronomy and astrophysics, life sciences, and journal metadata. The guide was created with permission from Harvard for the use of definitions and the Texas Digital Library for basic design. Ce guide est aussi disponible en français.
L’Institut Pasteur agit en faveur de la Science Ouverte en adoptant en mai 2021 deux textes fondateurs : une charte pour le libre accès aux publications et une politique de gestion et partage des données de la recherche et codes logiciels.
In the arts and humanities, digital data production is still expensive, challenging and time-consuming. We all know this, and yet the results of these processes often in the end can’t be reused by other researchers, meaning that we reinvent (or redigitise) the wheel far too often. This resource is aimed at giving practical advice for arts and humanities scholars who are willing to take their first steps in research data management but don't know where to begin. Our approach to data management views it as a reflective process that exposes and tweaks existing behaviours, rather than one that introduces specific tools. It is intended to encourage awareness of one’s own processes and mindfulness about how they could be more open and how and how small changes across three points in your research workflow can make big differences.
Nature 26 Oct 2021--Catalogue of billions of phrases from 107 million papers could ease computerized searching of the literature. Catalogue of billions of phrases from 107 million papers could ease computerized searching of the literature.
In a project that could unlock the world’s research papers for easier computerized analysis, an American technologist [Carl Malamud]has released online a gigantic index of the words and short phrases contained in more than 100 million journal articles — including many paywalled papers.
The catalogue, which was released on 7 October and is free to use, holds tables of more than 355 billion words and sentence fragments listed next to the articles in which they appear. It is an effort to help scientists use software to glean insights from published work even if they have no legal access to the underlying papers, says its creator, Carl Malamud. He released the files under the auspices of Public Resource, a non-profit corporation in Sebastopol, California that he founded.
Malamud says that because his index doesn’t contain the full text of articles, but only sentence snippets up to five words long, releasing it does not breach publishers' copyright restrictions on the re-use of paywalled articles. However, one legal expert says that publishers might question the legality of how Malamud created the index in the first place.
Nature, July 2019. -- A giant data store quietly being built in India could free vast swathes of science for computer analysis — but is it legal? A giant data store quietly being built in India could free vast swathes of science for computer analysis —but is it legal?
Over the past year, Malamud has — without asking publishers — teamed up with Indian researchers to build a gigantic store of text and images extracted from 73 million journal articles dating from 1847 up to the present day. The cache, which is still being created, will be kept on a 576-terabyte storage facility at Jawaharlal Nehru University (JNU) in New Delhi. “This is not every journal article ever written, but it’s a lot,” Malamud says. It’s comparable to the size of the core collection in the Web of Science database, for instance. Malamud and his JNU collaborator, bioinformatician Andrew Lynn, call their facility the JNU data depot.
Certain words are like sparks in a puddle of gasoline. “Bias” is definitely one of those words—and for good reason. If there is something that we are doing, that we are unaware of, that is causing harm to others, then we definitely should be taking it seriously.
This framework supports both the development and review of Institutional Strategies for Research Data Management (RDM). It can be used by administrators, service providers, strategic analysts, and researchers themselves to explore the spectrum of RDM engagement, support, and resources offered by their institution.
Unable to piece together all the different indicators, colleges and their instructors struggle to glean real wisdom, let alone adjust to a student’s needs, write Cathy O’Bryan and Bhavin Shah. September 8, 2021
This document provides an overview of the Qualitative Data Repository's (QDR) internal curation process. The process includes standardized steps from depositor contact, file processing procedures and Dataverse repository operations, to publication of the data project and thereafter.
The goal of this workshop is to provide participants with the opportunity to develop their understanding of the Canadian Research Data Management landscape. This Workshop in a Box plan was developed by the NDRIO Portage Network (‘Portage’) in collaboration with Fanshawe College.
There are videos and case studies associated with the book.
Focused on both primary and secondary data and packed with checklists and templates, it contains everything readers need to know for managing all types of data before, during and after the research process.
-Exigences minimales pour les plans de gestion des données
-Critères de sélection des dépôts dignes de confiance
-Conseils aux examinateurs pour évaluer des DMP
online tool which helps researchers and data managers assess how much they know about the requirements for making datasets findable, accessible, interoperable, and reusable (FAIR) before uploading them into a data repository.
Use `Shift+Z` to **add** shared fodlers/files to your Google Drive instead of creating a shortcut. `rclone` can not download shortcuts at the moment even though the current beta seems to be able to.
Snorkel is a system for programmatically building and managing training datasets without manual labeling. In Snorkel, users can develop large training datasets in hours or days rather than hand-labeling them over weeks or months.
Im Rahmen des BMBF-geförderten Metavorhabens „Digitalisierung im Bildungsbereich“ ist das Deutsche Institut für Erwachsenenbildung verantwortlich für die Gestaltung eines regelmäßigen Dialogs zwischen Wissenschaft und Praxis der Erwachsenenbildung.
What would open data for a typical synthetic organic chemistry paper look like?; What would open data for a typical molecular dynamics based paper look like?; How raw should the deposited data be? Do funders have a view on, for example, whether I should deposit an NMR spectrum or the actual fid which can then be processed to give the spectrum?; What about iterative experiments? If I quote yield of 80% for a synthesis should I deposit data only for that synthesis or also for all the iterated syntheses that led to the final one?
Open Babel is a chemical toolbox designed to speak the many languages of chemical data. It's an open, collaborative project allowing anyone to search, convert, analyze, or store data from molecular modeling, chemistry, solid-state materials, biochemistry, or related areas.
Gives good examples of what to write in a DMP. Also gives examples of bad or incomplete answers in a DMP. Also gives a rubric of how a DMP is evaluated.
To assist researchers in developing transparency-related materials for a project To assist researchers in determining which materials are appropriate for internal documentation, and which would be useful or necessary to outsiders seeking to understand the project To serve as a project “table of contents”
S. Farhat, L. Tubati, M. Osiemo, и R. Dave. JCDL '22: The ACM/IEEE Joint Conference on Digital Libraries in 2022, Cologne, Germany, June 20 - 24, 2022, стр. 34. ACM, (2024)
O. Hassan, O. Aderibigbe, O. Efijemue, и T. Onasanya. Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, стр. 906--908. New York, NY, USA, ACM, (2024)
M. Bechny, F. Sobieczky, J. Zeindl, и L. Ehrlinger. Proceedings of the 33rd International Conference on Scientific and Statistical Database Management, стр. 214–219. New York, NY, USA, Association for Computing Machinery, (11.08.2021)
M. Bechny, F. Sobieczky, J. Zeindl, и L. Ehrlinger. Proceedings of the 33rd International Conference on Scientific and Statistical Database Management, стр. 214–219. New York, NY, USA, Association for Computing Machinery, (11.08.2021)
J. Choi, A. Khlif, и E. Epure. Proceedings of the 1st Workshop on NLP for Music and Audio (NLP4MusA), стр. 23--27. Online, Association for Computational Linguistics, (2020)
J. Choi, A. Khlif, и E. Epure. Proceedings of the 1st Workshop on NLP for Music and Audio (NLP4MusA), стр. 23--27. Online, Association for Computational Linguistics, (2020)
A. Harth. Web Semantics: Science, Services and Agents on the World Wide Web, 8 (4):
348--354(2010)Semantic Web Challenge 2009 User Interaction in Semantic Web research.