What Is Open Data?
"Open data"
is data that can be freely used, reused and redistributed by anyone -
subject only, at most, to the requirement to attribute and sharealike.
(From the Open Data Handbook)
The Open Data movement began with efforts to improve access to government data (see data.gov). This page focuses on open scientific data: the primary research data published within or alongside research papers.
is data that can be freely used, reused and redistributed by anyone -
subject only, at most, to the requirement to attribute and sharealike.
(From the Open Data Handbook)
The Open Data movement began with efforts to improve access to government data (see data.gov). This page focuses on open scientific data: the primary research data published within or alongside research papers.
Why Share Research Data?
- Allows data to be audited
Innovation
- Data can be reused or recombined in unexpected ways
- Data can be used to plan new studies
- New markets for services related to curation, preservation, analysis, visualization
Efficiency
- Focus resources and efforts appropriately
- Avoid repeating studies
- Avoid repeating mistakes
- Work smarter and more quickly
- Prevent loss of data - A Canadian study reported that 80% of scientific data are lost within two decades: The Availability of Research Data Declines Rapidly with Article Age (Current Biology, 19 December 2013)
- Studies show that papers with publicly available datasets receive a higher number of citations: Data reuse and the open data citation advantage (PeerJ, 1 October 2013)
- From the NCBI Insights blog, 16 September 2013: "NCBI’s Open Data – A Source of Experimental Data for Important Discoveries".
This post summarizes 3 recent cases where researchers used data from
GEO, PubChem, and dbGaP to make significant discoveries. - "Sharing of Data Leads to Progress on Alzheimers", by Gina Kolata, New York Times, 12 August 2010
Open Data in eScholarship@UMMS
eScholarship@UMMS can serve as a home for research data files that
support scholarly publications, including dissertations, theses, and
journal articles that must meet requirements for the preservation and
dissemination of data.
support scholarly publications, including dissertations, theses, and
journal articles that must meet requirements for the preservation and
dissemination of data.
Benefits of sharing your data through eScholarship@UMMS:
- it is free to you
- it can accommodate large file sizes and a variety of file formats
- it can apply embargoes for access controls
- it can provide a persistent identifier, such as a DOI
- it includes sufficient metadata to enable discovery and reuse
- it is a redundant copy of your data in the event that something should happen to your original
- it is managed by the library
Open Data Resources
Creative Commons and Data
Open
data is facilitated by sharing under public terms to manage copyright
restrictions that might otherwise limit dissemination or reuse of data,
e.g. CC licenses or the CC0 public domain dedication.
DataCite
An
international membership organization which aims to: establish easier
access to research data; increase acceptance of research data as
legitimate contributions in the scholarly record; and, to support data
archiving. DataCite assigns DOI persistent identifiers to datasets.
Is It Open Data?
An Open Knowledge Foundation service to help people ask data holders about the openness of their data.
NIH Data Sharing Policies
Lists
data sharing policies in effect at NIH. It includes policies at the
NIH, IC, division, and program levels that apply to broad sets of
investigators and data.
Open Data - from SPARC
Open data resources from SPARC, the Scholarly Publishing and Academic Resources Coalition.
Open Data Handbook
This
handbook from the Open Knowledge Foundation is an introduction to the
legal, social and technical aspects of open data, with a focus on
government data.
Panton Principles for Open Data in Science
Launched in 2010 to raise awareness and call for open data in science.
PLOS Data Availability Policy
In
2014 PLOS updated their data policy to include a new requirement that
authors submit a data availability statement with their manuscripts. A
data availability statement indicates the location of the data set(s)
that underpin the findings being reported.
The Tao of Open Science for Ecology
This
article in Ecosphere includes a chart showing different open science
workflows, and a table listing a wide range of tools available to
support open science. The workflows and tools are applicable to many
disciplines.
Data Repositories for the Biological Sciences
There are many options for publicly
sharing data sets as a condition of publication, including
government-sponsored repositories, disciplinary repositories,
third-party repositories, and the UMMS institutional repository, eScholarship@UMMS.
For large datasets, here are selected data repositories for the biological sciences:sharing data sets as a condition of publication, including
government-sponsored repositories, disciplinary repositories,
third-party repositories, and the UMMS institutional repository, eScholarship@UMMS.
1000 Genomes
Sequence and alignment data generated by the 1000genomes project.
dbGaP
Developed to archive and distribute the results of studies that have investigated the interaction of genotype and phenotype.
Dryad
Dryad
is an international repository of data underlying peer-reviewed
scientific and medical literature, particularly data for which no
specialized repository exists. All material in Dryad is associated with a
scholarly publication. For information about depositing to Dryad,
including general recommendations for data management and describing
data, see http://datadryad.org/pages/depositing.
figshare
figshare
is a commercial service that provides a repository for researchers to
publish all of their research outputs in a citable, sharable and
discoverable manner. figshare is supported by Digital Science, a
Macmillan Publishers company.
GenBank
GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences.
GEO (Gene Expression Omnibus)
An
international public repository from NCBI that archives and freely
distributes microarray, next-generation sequencing, and other forms of
high-throughput functional genomics data submitted by the research
community. Some submitted data is curated into GEO DataSets
(http://www.ncbi.nlm.nih.gov/gds).
GigaScience
GigaScience
is an integrated database and journal that publishes biological and
biomedical research datasets that are affiliated with BioMed Central in
collaboration with BGI, DataCite, BioSharing, and ISA-Tab.
Mouse Genome Informatics
International
database resource from The Jackson Laboratory for the laboratory mouse,
providing integrated genetic, genomic, and biological data. The
projects contributing to this resource are: Mouse Genome Database (MGD)
Project, Gene Expression Database (GXD) Project, Mouse Tumor Biology
(MTB) Database Project, Gene Ontology (GO) Project at MGI, and MouseCyc
Project at MGI.
NITRIC Image Repository
Neuroimaging Informatics Tools and Resources Clearinghouse (NITRC) repository of MRI scans.
Protein Data Bank
A
worldwide repository of information about the 3D structures of large
biological molecules, including proteins and nucleic acids.
PubChem
Search BioAssay, Chemical Structure and Chemical Substances.
Data Journals
Data journals are publications whose purpose is to expose and share research data, and to promote its re-use. Examples of data journals:
BMC Research Notes (BioMed Central)
An
open access, peer-reviewed journal that publishes short publications,
cases, updates, software, databases, and data sets in the fields of
biology and medicine.
Dataset Papers in Science (Hindawi)
An open access, peer-reviewed journal, Dataset Papers in Science publishes data papers on science and medicine subject areas.
GigaScience (BioMed Central)
Publishes 'big-data' studies from the entire spectrum of life and biomedical sciences.
Journal of Open Psychology Data (Ubiquity Press)
Makes psychology research data available for replication studies and additional studies.
Open Health Data (Ubiquity Press)
Publishes peer reviewed data papers describing health datasets with high reuse potential.
Scientific Data (Nature Publishing Group)
New publication which launched in May 2014 for descriptions of scientifically valuable datasets.
Registries and Lists
Data.gov
Home
of the U.S. Government's open data. You can find federal, state and
local data, tools, and resources to conduct research, build apps, design
data visualizations, and more. Includes health, science, and public
safety data.
Genomic Data Repositories
Repositories, databases, and database collaborations for NIH-funded genomics data.
HSRIC: Data, Tools, and Statistics
A selected listing of data tools, data repositories, health statistics, and surveys for the health services research community.
NIH Data Sharing Repositories
Annotated
listing of NIH-supported data repositories that make data accessible
for reuse, including submission and access information.
Open Access Directory -- Data Repositories
A
list of repositories and databases for open data. Organized by subject,
including Biology, Chemistry, Medicine, and Multidisciplinary.
PLOS Recommended Repositories
PLOS,
which has a data availability requirement, has compiled a list of
recommended repositories organized by content type, which an author may
choose to deposit data in, if they are appropriate to their field.
re3data.org
A
global registry of data repositories that provide permanent storage and
access to data sets. Subject classifications include animal genetics,
biochemistry, biomedical technology, cognitive science, and
microbiology, among others.
Publicly Available Health and Social Science Data Collections
Databases for Aging Related Secondary Analysis in the Behavioral and Social Sciences
A
compilation of publicly available, national and international data sets
supported in whole or in part by the National Institute on Aging
Division of Behavioral and Social Research.
Gallup Polls
Polling data on a host of topics.
Global Health Observatory Data Repository
Access
to over 50 data sets on important public health issues such as
mortality, diseases, health systems, violence and injuries, and equity.
Healthdata.gov
Managed
by the Department of Health and Human Services, this site makes HHS
data easily accessible to the public and to health innovators, including
data on clinical care provider quality, health service provider
directories, consumer product data, and community health performance
information.
ICPSR
The
Inter-university Consortium for Political and Social Research is a
long-standing archive of over 500,000 social science data sets. Topic
classifications include Healthcare and Facilities.
National Archive of Computerized Data on Aging
Funded
by the National Institute on Aging and hosted at the ICPSR, NACDA makes
available the largest library of data on aging in the United States.
National Center for Health Statistics
Part
of the Centers for Disease Control and Prevention, NCHS is the
principal health statistics agency for the United States. Their website
includes data sets on a wide range of issues related to public health,
from birth control to whooping cough.
Pew Research Center
A
"nonpartisan fact tank," Pew conducts public poll, media analysis, and
demographic research on many topics of interest, including health and
health care.
Project Tycho
A
database of publicly available US and international data on public
health, including the entire history of weekly National Notifiable
Disease Surveillance System reports for the United States (1888-2013).
Qualitative Data Repository
Housed
at Syracuse University, QDR selects, ingests, curates, archives,
manages, durably preserves, and provides access to digital data used in
qualitative and multi-method social inquiry.
Other Sources for Data
Get the Data
An online community that offers a place to share questions and get answers related to available datasets.
Resources for datasets; particularly good for those wanting to create infographics.
Google's Public Data Directory
Google's
Public Data Directory provides validated datasets in graph and text
formats that can be easily searched and narrowed down.
Number Of
Looking for the specific number of a specific thing, e.g. the number of fast food restaurants in America? This is your resource!
RealClimate Data
A collection of selected sources of code and data related to climate science.
Reddit.com
An open source, open sharing source for data on an enormously wide range of topics.
UN Data
Factual, authoritative, and official international data from the United Nations.
Open Data & Data Sharing - Open Access - LibGuides at University of Massachusetts Medical School
No comments:
Post a Comment