Semantic Publishing

Towards Explicit, Executable, Reusable, and Automatically-Disseminated Scientific Publications

Projections suggest that the delay between scientific discovery, and the dissemination and implementation of the knowledge embodied in that discovery, will soon vanish. At that point, all knowledge resulting from an investigation will be instantly interpreted and disseminated, influencing other researcher's experiments, and their results, immediately and transparently. This clearly requires that research results be of extremely high quality and reliability, and that research processes – from hypothesis to publication – become tightly integrated into the Web. Though the technologies necessary to achieve this kind of “Web Science” do not yet exist, our recently-published studies of automated in silico investigation demonstrate that we are enticingly close, and a path toward next-generation Web Science is now clear.

We propose to dramatically alter the way high-throughput in silico research is done. We will synthesize and evaluate Web Science frameworks, investigating the technologies and knowledge-infrastructures necessary in this novel environment to ensure a rigorous scientific process, including debate, accuracy, transparency, reproducibility, and peer-review. Web Science simplifies in silico research for bench scientists by providing an ecosystem of expert knowledge and analytical strategies that can be accurately and automatically assembled. More broadly, it facilitates scientific discourse by enabling researchers to easily see their data through another's eyes, explicitly compare disparate hypotheses to precisely identify differences in opinion, automatically evaluate those hypotheses over novel data-sets to investigate their validity, and integrate the resulting knowledge directly into the community knowledge-pool in the form of “executable publications”. Finally, it enhances scientific rigor, particularly for high-throughput experiments, by helping to eliminate bias, and by improving the documentation and reproducibility of published results.

We undertake our Web Science research in the context of two novel metagenomic datasets currently being analysed using conventional approaches. The first, from the Clinical Sciences, involves the blood microbiota from patients suffering from sepsis, where it is believed that differences in the nature of the septic infection affect morbidity and mortality as much or more than the patients own genotype. Second, with the goal of achieving zero-impact, green energy bio-fuels, we will analyse data from the soil and systemic metagenome of Brazilian Sugar Cane plants with a differential ability to grow without added fertilizer. In both cases, we design, construct, and evaluate Web Science infrastructures that automatically resolve, and publish the results of, questions pertaining to pathogenesis and Nitrogen-metabolism (respectively). In addition, we examine our ability to model, and compare, conflicting hypotheses with the goal of enhancing scientific discourse through formalizing, modifying, and (re-)interpreting, scientific queries. Finally, we assess the ability of non-experts to construct these formal hypothetical models given appropriate tooling.

The Web, to date, has only cosmetically changed the research process. Web Science re-defines scientific methodology by fully integrating it with a global network of knowledge and expertise.