+ - 0:00:00
Notes for current slide
Notes for next slide

Replication, Metadata, and Open Science

Ted Laderas

2019-05-15

1 / 31

Learning Objectives

  • Explain why the replicability crisis is important
  • Describe what metadata is and why it matters to the replicability crisis
  • Name 3 types of experimental metadata
  • Describe what open science efforts there are out there
  • Identify Barriers that prevent open science adoption
2 / 31

What is Replicability?

Reproducibility Matrix

3 / 31

Reproducibility versus Replicability

  • Reproducibility: Same Analysis, Same Data
  • Replicability: Same Analysis, Different Data

https://the-turing-way.netlify.com/reproducibility/03/definitions

4 / 31

What is the Replication Crisis?

  • Some key scientific findings cannot be replicated by other labs
  • Survey of 1500 scientists:
    • 70% surveyed couldn't replicate a colleagues study
    • 50% couldn't replicate their own study
  • Reproducibility project: findings for only 39 out of 100 psychological studies could be replicated

https://www.apa.org/monitor/2015/10/share-reproducibility

5 / 31

Marshmallow Test Study Findings

  • One marshmallow now, but two marshmallows if you wait
  • Walter Mischel: Self Control/delayed gratification leads to better outcomes in life
    • Followed up with participants 18 years later
    • Found increased "cognitive and academic competence" among the two marshmallow kids
  • Findings were used to sell "grit" as fix in education
7 / 31

Replicating the Marshmallow Test

  • Study has been difficult to replicate
    • Is it really just measuring socioeconomic status?
    • Well-off children tend to get better education
  • "Conceptual Replication Study": self-control is not associated with better outcomes

A new approach to the marshmallow test Leads to complicated findings

8 / 31

Why do we have the replication crisis?

Cultural:

  • Pressure to publish in prestigious journals such as Nature
  • Requires 'impactful' results to get published
    • Bias against negative results

Statistical:

  • Ioaniddis: Studies rely on questionable research practices
    • Sample Sizes are too small to justify conclusion
    • p-hacking: running too many statistical tests and only reporting the good ones

Experimental:

  • Not enough detail to rerun the experiment and analysis
10 / 31

Roger Peng on the Replicability Crisis

The replication crisis in science is concentrated in areas where (1) there is a tradition of controlled experimentation and (2) there is relatively little basic theory underpinning the field.

https://simplystatistics.org/2016/08/24/replication-crisis/

11 / 31

Spurious Correlations and False Findings

You can't trust what you read about nutrition

12 / 31

What are some solutions to the crisis?

  • Better statistical practices
    • Pre-registration of analyses
    • Larger sample sizes
    • More stringent cutoffs for statistics
  • Better education about reproducibility practices
  • Metadata and Open Science Practices
    • Provide enough details about the experiment
    • More transparency about how experiment was conducted

Scientific method: Statistical errors

13 / 31

What is Metadata?

  • "Data about data"
  • “Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource” (NISO)
  • Working definition: Information that lets us utilize a dataset effectively
14 / 31

Metadata Example

Without the label metadata, we don't know what's in the can!

15 / 31

What Kinds of Experimental Metadata Are there?

  • Date experiment was done
  • Time a measurement was made
  • Number of repeated measurements
  • Who conducted the experiment
  • Dosage of treatment
  • How many subjects were in study
  • How many subjects dropped out of study
  • Experimental design
16 / 31

Metadata and studies (Activity)

We want to know if we can combine data from two different sites that conducted the same experiment. The two sites are: 1) a local university and 2) a nursing home.

The study consists of testing whether a weight-loss drug works compared to a placebo on patients from both sites. Patients are weighed before treatment and after treatement.

What metadata would you want to know about these two experiments? Think about the details you'd want at the experiment level, not the patient level.

One example: What was the dosage of drug given to the patients at the two sites? Was it the same?

17 / 31

Minimum Information Standards for Experiments

18 / 31

What drives minimum information standards?

  • Public Databases/Repositories of data
    • Some require these minimum information standards
  • Government Mandate (NIH agencies)
    • Data must be submitted to public repositories
  • Open Science and Need for transparency
19 / 31

What is Open Science?

  • Openly sharing your research
    • Open Publications
    • Open Experimental protocols
    • Open Software
    • Open Data
  • Community of Practices surrounding each of these
20 / 31

Benefits to Open Science

  • Transparency
  • Accessibility
  • Efficiency
  • Many minds on the same dataset
  • Community of Practice

https://www.fosteropenscience.eu/content/what-are-benefits-open-science

22 / 31

Open Access Publications

  • Preprint Servers (bioArxiv)
    • The cutting edge of Science
  • Open Access Publications
    • Public Library of Science (PLOS)
23 / 31

Open protocols: Protocols.io

24 / 31

Open Data Repositories

  • Data should be FAIR: (Findable, Accessible, Interoperable, and Re-usable)
  • Recommended Data Repositories
  • Repositories make data available for everyone to download
    • Animal studies/Cell Line studies are easy to share
    • Patient Datasets are difficult (HIPAA)
25 / 31

Open Software: rOpenSci

  • Community that develops software for scientific research
  • Review process, community discussion
26 / 31

Barriers to Open Science

  • Researchers "don't have enough time"
  • Open Studies expose researchers to more detailed scrutiny
  • Perception of "Research Parasites"
    • Secondary Use of data
    • Studies cost money, sharing can be detrimental to career
  • Open Contributions are not recognized
    • Promotion and Tenure guidelines
  • Funding incentives
    • "Not in my job description"
  • Culture of shame/impostor syndrome
28 / 31

Research Parasites

Cons of secondary use:

  • Generating Data costs Grant Money/Time
  • Secondary use of data is not beneficial to generators of data

Pros:

  • "The act of rigorous secondary data analysis is critical for maintaining the accuracy and efficiency of scientific discovery." (Greene et al)
  • New discoveries from other people reusing data for analysis

Celebrating Parasites

29 / 31

Leslie Chan: Openness can be exploited

But if data is indeed the new oil, who owns the oil, and how should it be governed, and for whose benefit? What about issues of privacy, security, consent, misuse, and other important ethical issues and historical injustice? - Leslie Chan

  • Openness is the new frontier
    • Exploitation/Manifest Destiny
  • Open Access requires money
    • Diversity/Inclusion: Volunteer work penalizes those who need to work
    • Research institutions that have money are overrepresented
  • Companies may exploit openness and communities
30 / 31

Take Home Points

  • Replication by another lab requires metadata
  • Metadata standards are driven by public databases
  • Open Science encourages transparency in process
  • Barriers to Open Science are cultural and technical
31 / 31

Learning Objectives

  • Explain why the replicability crisis is important
  • Describe what metadata is and why it matters to the replicability crisis
  • Name 3 types of experimental metadata
  • Describe what open science efforts there are out there
  • Identify Barriers that prevent open science adoption
2 / 31
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow