Analysis of barcoded amplicon data for SIP

Arguably, the most significant advancement in the field of DNA- and RNA-SIP  in recent years came from the introduction of high-throughput sequencing techniques and their adoption to the study of microbial communities using barcoded amplicon sequencing \cite{Xia_2011,Lee_2011,Pilloni_2011}.  The ability to sequence dozens of samples simultaneously to a very high depth meant that it was now possible to identify rare taxa that were labelled but also taxa that are only partially labelled. Before the adoption of high-throughput sequencing (HT-sequencing), successful labelling of DNA or RNA was done visually, either by detecting a second band of nucleic acids under UV light following ethidium bromide staining or fractionating the gradient into multiple fractions, amplifying the nucleic acids using PCR or qPCR and evaluating the intensity of the bands or copy numbers. The use of fingerprinting techniques such as DGGE and TRFLP enabled not only a more sensitive comparison between fractions but also a direct, albeit qualitative, insight into how many phylotypes were labelled. However, it still suffered from low resolution and a high degree of noise that are inherent to these methods. Moreover, the unequivocal identification of the labelled microbes was still low-throughput, laborious and costly since it required the construction of clone libraries followed by Sanger sequencing. Barcoded amplicon sequencing allows for robust, semi quantitative comparison of different fractions along a density gradient, as well as an identification of the identity of which microbes became labelled and which did not. Moreover, the ability to obtain thousands of sequences per sample meant that even labelling of minor members of the community could be detected — something that could not be achieved with standard molecular fingerprinting techniques or Sanger sequencing. The adoption of HT-sequencing technologies also called for new analytical methods that could take advantage of this increase in sensitivity through statistical modelling and enable robust detection of either minor or partially labelled members of the active guild \cite{Zemb_2012,Zumsteg_2013}. However, alongside with added sensitivity barcoded amplicon sequencing also presents some challenges for comparing samples because it is difficult to control the number of sequences per sample, also known as the library depth. The problem is not unique to analysing SIP experiments and poses a major analytical challenge in the field of microbiome studies and comparative transcriptomics (RNA-Seq). In essence, most statistical methods used for comparison assume that across different samples, templates with identical relative abundance should have equal chances of being sequenced and thus any observed differences are an indication that the true abundance of the given sequence differs between the samples. In ecology, the issue is known as "sampling effort". Traditionally, the most common way to alleviate the problem of unequal sequencing depths was to randomly sub-sample sequences from each sample down to the smallest sample size so that all samples become equal (a process sometimes called "rarefaction"). This practice, however, came under scrutiny in recent years and sparked some heated polemic papers on how to best handle microbiome data \cite{McMurdie_2014}. While the severity of the bias caused by random sub-sampling is debated, it is generally accepted that this is a sub-optimal way to deal with the problem. Another common approach is to convert all abundances to relative abundances and compare the different sequences on a fraction (or percentage) basis. This, however, leads to other problems since it maintains the correlation between sequencing depth and the number of unique sequences (or OTUs) while at the same time drastically reducing the number of degrees of freedom by coercing the sum of abundance in each sample to 100% \cite{Faust_2012}. More recent methods try to "eat the cake and leave it whole" by attempting to equalise the variance between samples through a scaling factor while not discarding any data (covered in \cite{Weiss_2017}). Whichever method is chosen, it is important to remember that no statistical trick can solve the inherent problems that stem from large differences in library sizes and these should be handled at the level of sample preparation or sequencing and not data analysis.

Differential abundance analysis and quantitative analysis

The most common methods for comparing fractions in SIP experiments were developed for analysing RNA-Seq datasets. The parallels are apparent; typical RNA-Seq experiments are designed as a case-control study and the analytical challenge is to identify which sequences are differentially expressed (either up-regulated or down-regulated) compared to the control, while overcoming the natural variance and differences in library sizes. Similarly, in SIP experiments one would like to identify which sequences are "differentially abundant" in the fractions where labelled nucleic acids are expected to be present compared to those where unlabelled nucleic acids are present. An important difference to RNA-Seq experiments is, however, that only enriched sequences in the 'heavy' fractions are of interest, while depleted sequences should only occur when labelling is strong enough to displace unlabelled sequences from the 'light' fractions to a noticeable degree. Nearly all existing data analysis methods should apply to both DNA- and RNA-SIP, albeit with some differences. This book offers two recent and very robust ways to analyse SIP datasets: quantitative SIP (qSIP; Chapter 11) and High-Resolution SIP (HR-SIP; Chapter 9). Both yield similar results, but they nevertheless differ in some details (discussed in \cite{Youngblut_2018a}). While High-Resolution SIP, like all other differential abundance methods, aims only at detecting labelled phylotypes, qSIP also attempts to quantify the level of enrichment per phylotype, but requires additional quantitative data from qPCR and also a matching unlabelled control sample for every labelled sample, to reliably detect growth. 

Data analysis for RNA-SIP experiments

Since both HR-SIP and qSIP are carefully detailed in this book, repeating the steps here would be redundant. However, because the methods were published for DNA-SIP, some differences to RNA-SIP should be noted. In principle, both methods rely on a comparison of the gradient fractions from labelled samples to those from unlabelled control samples (between-gradient comparison). Moreover, both assume and make use of the fact that while DNA and RNA will concentrate around their theoretical BD, they diffuse throughout the gradient in a Gaussian shape so that amplifiable amounts of nucleic acids are present in every fraction in the gradient \cite{Angel_2017,Youngblut_2018}. However, because the course of development of a microbial community is controlled by stochastic processes in addition to deterministic ones, parallel incubations from the same parent community often lead to different communities after a while, even if conditions are kept as similar as possible. Consequently, it was demonstrated that these stochastic variations reduce the detection accuracy and it was recommended that the Bray-Curtis dissimilarity between communities of labelled and unlabelled samples that are being compared should ideally be >0.2 \cite{Youngblut_2018}. Between-gradient comparisons are crucial for DNA-SIP because as mentioned above, the DNA of different taxa will migrate in the gradient also based on their G+C content. Moreover, the migration based on G+C content is not constant per phylotype. Instead, it will vary based on the size of the DNA fragment surrounding the gene of target, which varies stochastically in most DNA extraction methods \cite{Youngblut_2014}. In RNA-SIP however, the buoyant density of RNA is less affected by G+C content, and one can assume that in a gradient from an unlabelled sample the relative abundance of each taxon should remain relatively constant throughout the different fractions. In contrast, in a gradient from a labelled sample, some taxa will be more abundant in the heavy fractions compared to the lighter ones, while the relative abundance of unlabelled taxa will remain constant throughout the gradient or decline in the heavy fractions if the labelled taxa make up a significant proportion of the entire community. In any case, since in RNA-SIP differential migration of taxa is only expected as a response of labelling, detection of labelled taxa can also be done in a within-gradient fashion by comparing the relative abundances of taxa in the heavy fractions (i.e., ca\(.~\)1.72--1.76 g ml-1 for DNA-SIP or 1.80--1.84 g ml-1 for RNA-SIP) with those in the light fractions (i.e., ca\(.~\)1.68--1.72 g ml-1 for DNA-SIP or 1.77--1.80 g ml-1 for RNA-SIP). However, some label-free controls should nevertheless be set up (e.g., paralleling the beginning and end time points or the highest and lowest treatment extremes) and analysed because they can help to fine-tune the statistical cutoff parameters so that false positives can be avoided \cite{Angel_2017}.

Network analysis using SIP data

Network analysis -- the prediction of microbial associations from presence-absence or abundance data is gaining popularity in ecological studies in general and microbiome studies in particular \cite{Faust_2012}. This type of analysis has also been used in concert with SIP to detect, for example, positive and negative correlation between phylotypes of ammonia-oxidising archaea, nitrite oxidising bacteria and methanotrophs \cite{Daebeler_2014}, clusters of anaerobic and aerobic bacteria in rewetted biological soil crusts \cite{Angel_2013}, or to identify community members that interact with methane-oxidizing bacteria \cite{Ho_2016}. However, in contrast to a standard network analysis on microbiome data, the interpretation of the results from a SIP experiment might not be so straightforward. First, most probably only the "heavy"  fractions from the labelled gradients should be analysed because changes in the "light" fractions are either already reflected in the "heavy" fractions (i.e., phylotypes becoming labelled and hence depleted in the "light" fractions), or not directly related to substrate incorporation (e.g., growth and death of phylotypes in the general community). Secondly, while the interpretation of positive correlations in the heavy fractions are relatively easy to interpret (i.e., two phylotypes acquire label under similar conditions), it is not entirely clear what negative interactions mean if anything at all. Thirdly, it is important to bear in mind that network analysis does not reveal the mode of the interaction between two interacting phylotypes and a positive correlating could mean that both use the same substrate, that there is cross feeding occurring (and thus the interaction is positive-positive or at least positive-neutral), or that one phylotype is praying on another (positive-negative interaction). Lastly, it should be noted that many replicates are required for a network to be stable (at least 25) and that communities should be reasonably similar in all samples \cite{Berry_2014a}. For SIP studies this probably translates into an analysis of at least 25 "heavy" gradient fractions, coming from both labelled and no-label control incubations. However, when analysing data from DNA-SIP experiments care should be taken when analysing multiple fractions from the same gradient since this could simply be a result of similar G+C contents.