Data analysis workflow at SporeData

SporeData services

Researchers will often ask us about our data analysis workflow. Most of our projects fall into two main categories. First, when a project starts after a proposal is awarded Second, when we provide Data Science support for research teams, Departments, Health Systems, companies, governments, or other organizations. Here are our main principles:

  1. Continuous delivery. The overarching principle behind our workflow is continuous delivery. In other words, we deliver a new set of results every week, usually on a Friday. The continuous delivery of results allows our collaborators to provide us with feedback on our analyses so that we can make changes to align our work with their expectations. The items below provide a typical sequence of steps.
  2. Mock results. Our very first iteration is often to provide researchers with mock “results.” Mock results will usually involve an abstract where we describe the study rationale, our planned methods, and a group of fictitious results. While the numbers in our fictitious results are not real, we ask researchers to check whether the tables and plots align with what they envision regarding the final message of the manuscript. In other words, mock results will place our data scientists and our collaborators on the same wavelength in terms of what the study is trying to accomplish.
  3. Clarifications regarding variables. A typical second wave of the study clarifies questions regarding the dataset. These will include questions about individual variables as well as the dataset as a whole. Examples of the former include the correspondence between variable names and questions in the case report form, variables with high missing rates, unexpected distributions, near-zero variation, among a host of different factors. Questions about the dataset as a whole will usually attempt to clarify issues regarding the location and time of data collection, data reliability, among other issues covered in the STROBE guideline (Von Elm et al. 2007).
  4. Methods section. In this phase, we release a preliminary version of the Methods section. While this section will often contain several questions to the investigators, it is draft in a format aligned with international reporting guidelines for the study design. In other words, we send documents in a format that will be ready for submission to a peer-reviewed journal.
  5. Tables, plots, and interpretation. Our next step is to send a set of tables and plots accompanied by a brief description of their main findings. While this description is not complete yet, our goal here is to make sure that the tables and graphics align with the main questions and messages our collaborators had in mind when they reviewed our mock results.
  6. Results section. At this stage, the Results section is delivered in its final format, compliant with international reporting guidelines, and ready to be submitted for review in a peer-reviewed journal.

Results

Von Elm, Erik, Douglas G Altman, Matthias Egger, Stuart J Pocock, Peter C Gøtzsche, and Jan P Vandenbroucke. 2007. “The Strengthening the Reporting of Observational Studies in Epidemiology (Strobe) Statement: Guidelines for Reporting Observational Studies.” Annals of Internal Medicine 147 (8): 573–77.