News

To take on reproducibility crisis, researchers develop data-sharing platform

The scientific community has been struggling with a reproducibility problem.

Data reproducibility is a key step in the process that guides how most scientists create new knowledge in their field. They begin by developing a hypothesis, then test that hypothesis through calculations or experiments. Ultimately, the data and conclusions of the research are published in a peer-reviewed journal, with the idea that the data provided should be reproducible.

But according to a recent Nature survey, more than 70% of researchers have tried and failed to reproduce another scientist’s published experiments. More than half even failed to reproduce their own investigations successfully.

That begs the question: If an experiment or a simulation cannot be reproduced, was it successful in the first place?

Researchers with the Institute for Molecular Engineering at the University of Chicago and Argonne National Laboratory's Midwest Integrated Center for Computational Materials (MICCoM) aim to help solve this problem with a new software platform that allows scientists to share the data of each of their publications in a searchable way. Qresp, a tool for curating, discovering, and exploring reproducible scientific papers, was developed over the last two years and is now available for public use.

“Our goal is to speed up the scientific process and reduce the time needed to share knowledge among researchers,” said Giulia Galli, Liew Family Professor of Molecular Engineering. “By making data available and searchable, we are hoping to make it easier for researchers to reproduce results.”

The process of reproducing scientific results remains a complex issue. Published papers, which are available online as PDFs, often don’t include enough information about the resulting data and processes for others to reproduce the results, and data are often not made available to the scientific community.

“Many papers do not include sufficient details to be able to reproduce the data,” said Marco Govoni, assistant scientist at Argonne and a visiting scientist at the University of Chicago. “And oftentimes the majority of data obtained and used in the paper are not available at all. To get that data, sometimes you need to write to the authors of the paper. It should not be this difficult, and it should not be an ad-hoc process.”

With Qresp, researchers hope to help relieve some of the current difficulties in making data open and reproducible. The software guides users through the process of organizing and sharing their data, including datasets and charts. All fields are customizable, letting researchers curate their data in the best way according to the paper they have written. The platform is also available for anyone who wants to explore data shared by other researchers.

Though there have been several efforts to manage large sets of data, most of those efforts rely on a central repository, while Qresp relies on a distributed model. Within the platform, researchers do not upload data but rather host their own curated data and decide what they want to share. That way, Qresp provides a scalable solution to sharing data, Govoni said.

Researchers who might not want to take this extra step to share their data right after publication should consider the benefits of organizing and sharing data for their own group, Galli said. Students in her group now automatically curate their data in Qresp as one more step in their scientific research. Qresp facilitates the transfer of information and knowledge between projects carried out by different students, and between researchers who stay and those who leave the group.

“The whole data sharing process in the group has become much more efficient,” said Galli, who is also a professor of chemistry and a senior scientist at Argonne National Laboratory.

With the use and adoption of Qresp by a broad community of researchers, published papers may become much more interactive—a living interface where, by clicking on an image, a researcher can see the dataset behind the results.

“We want to raise the bar for reproducing scientific results, and we want to move beyond publishing PDFs and into sharing research that is much more interactive and useful,” Govoni said.

Other authors on the paper include Juan de Pablo, Liew Family Professor in Molecular Engineering at the University of Chicago; postdoctoral research scholar Federico Giberti; software developer Aditya Tanikanti; and Hakizumwami Runesha, Milson Munakami, and Jonathan Skone of the University of Chicago Research Computing Center.

Citation: “Qresp, a tool for curating, discovering and exploring reproducible scientific papers.” Govoni et al, Nature Scientific Data, Jan. 29, 2019. doi: 10.1038/sdata.2019.2

Funding: Midwest Integrated Center for Computational Materials (Department of Energy)