The results showed clusters with patterns expected by sample types


The inverse of this measure is frequently used because it has the more intuitive interpretation that higher numerical values means higher diversity. Mathematically, in-verse Simpson’s index is always greater than 0 and increases with increasing diversity. The third measure, “observed”, is simply the number of OTUs in a sample. Figure 9 depicts the correlation between read counts and these three diversity indices for spiked liquids and cultures, colored by sample type and spike volume in a and b. In all cases, Pearson’s correlation coefficient is negative. However, this coefficient is only truly significant in the case of Shannon’s index , with a p-value of 0.004. For all diversity indices, the correlation is weak, and plots look mostly scattered. The linearity can be described as weak at most, if linear at all, though the slightly negative trend is common enough across all three indices to merit some further consideration. What is interesting is the sharp divide in Figure 9a between the culture and liquid samples, along with the lack of this sharp divide in 9b where the data is plotted by spike volumes. The divide in Figure 9a speaks to an unequivocal diversity difference between liquids and cultures, whereas 9b shows that diversity did not seem to correlate with the volume of spike-ins. Much of the diversity difference between liquids and cultures can be explained by the dominance of a few OTUs in cultures.

This dominance is especially evident in 9b, where liquids and cultures respectively decrease in diversity with increasing spike-in volumes and thus increasing dominance by a single OTU. Furthermore, planting blueberries in containers the dominance of E. coli in the liquid samples underscores the high biomass and dominant presence of non-E. coli OTUs in the culture samples, evidenced by the large diversity differences between spiked and unspiked liquids compared to the small diversity differences between spiked and unspiked cultures. Taken together, these observations are clear: dominance of one or a few OTUs led to a decrease in diversity in liquids and samples; and read counts in cultures were much less evenly distributed across different OTUs than read counts in liquids. Because a weak correlation between sequencing depth and diversity appeared in the data set, and because sample diversity seemed greatly varied , we decided to examine rarefaction curves for liquids and sedimentary cultures, to determine whether rarefying to an even depth would be needed to make analytically sound comparisons across samples. In ecology, rarefaction has been developed as an approach to address sampling artifacts, particularly ones that relate to the trend that increasing sampling depth increases diversity. In other words, rarefaction is often used to reduce false discovery rates and make samples more comparable to one another. From the rarefaction curves, we saw that the number of OTUs discovered indeed increased rapidly for most samples until approximately 20,000 reads . Furthermore, at different sequencing depths , the number of OTUs discovered varied greatly among samples. To be able to make meaningful comparisons across samples, we needed to make sample sizes uniform, so we decided to rarefy samples to an even depth of 20,000 reads, which eliminated a spiked culture with low read counts and three unspiked controls.

Rarefaction also decreased OTU counts by at least 25%. However, the relative numbers of OTUs did not change substantially . For instance, rarefied 200µL-spiked cultures still contain the highest number of OTUs while rarefied 500µL-spiked liquids still contain the lowest numbers. This effect of rarefaction – reducing the number of potentially spurious OTUs, while retaining the validity of the qualitative comparisons across samples – was important to the preliminary experiment as well as subsequent experiments, while one of the most cited undesirable effects – the loss of rare OTUs – was not a central concern for the preliminary experiments. To examine the rarefaction process and its effects on alpha diversity in a more detailed manner, we looked more closely at the inverse Simpson’s index. Figure 10 shows the values of this index for all samples before and after rarefaction to 20,000 reads. The alpha diversity values clearly decreased upon rarefaction while trends in cross-sample comparisons shown by inverse Simpson’s index persisted through the process of rarefaction . Because cultures had much higher biomass than controls and liquids, the E. coli spike-in dominated controls and liquids while other OTUs dominated the cultures. The direct and major consequence here was that alpha diversity values of the spiked controls and liquids appeared comparable to those of the pure E. coli samples while those of the spiked cultures were universally higher than those of the spiked controls and liquids. In other words, the addition of E. coli increased richness slightly but decreased evenness greatly for low-biomass samples, thereby decreasing the diversity as represented by inverse Simpson’s index.

Since the effect of E. coli spike-ins given the experimental optical density values and spike volumes was markedly dependent on the amount of cells in the samples, E. coli spike-ins did not affect cultures nearly as dramatically as they affected controls and liquids. We also examined the prevalence of different phyla and found seven phyla with high abundance. Of these seven, Firmicutes, Bacteroidetes, and Proteobacteria were the most abundant and prevalent, followed by Epsilonbactereota, Actinobacteria, Fusobacteria, and Cyanobacteria. The phyla listed here contain some of the most common oral bacterial species. For instance, those in the Streptococcus genus and Veillonella genus, belong to the Firmicutes phylum; other common oral species, including those in the Neisseria genus, belong to the Proteobacteria phylum. Species in the Prevotella genus fall into the Bacteroidetes phylum, and E. coli, the species used for spike-ins, belong to theProteobacteria phylum. Other phyla seem to be neither as prevalent nor as abundant as the seven mentioned above. To determine whether decontamination had the intended effect, we looked at bar plots of negative controls, first at non-rarefied read counts with and without the most dominant spike OTU , and then at both the non-rarefied and rarefied relative abundances . These figures indicate that E. coli read counts in the spiked samples far outnumbered read counts from oral bacterial OTUs in all negative controls, as evidenced by reads from the Escherichia-Shigella genus comprising more than 95% of the abundance in both rarefied and non-rarefied controls. Somewhat surprisingly, unspiked controls also sequenced with higher relative abundances of E. coli than expected, in most cases higher than 40%. However, the high percentages were hardly concerning because of the low starting biomass in these samples. In general, unspiked controls had such low biomass that the highest read counts came in at just over 1,000 . For reference, qPCR of kit-extracted Nanopure-grade water yields between 80 to 750 reads per µL . Based on the read counts of true negative controls, we can safely assume that the unspiked negative controls were not contaminated. As for spiked controls, in addition to the high relative abundance of E. coli, the Prevotellaceae and Veillonella genera contributed to high proportions of the reads in both non-rarefied and rarefied controls. Since these genera contain commonly occurring oral bacteria, their presence in the negative controls implies that cells in control wells came not from external contamination, container growing raspberries but from infrequent crossing of cells from the host plates and wells to the control plates and wells. This lack of external contamination, in conjunction with the low read counts of OTUs not belonging to the EscherichiaShigella OTU, meant that the culture media had been properly decontaminated, and that other steps in the incubation, extraction, and sequencing procedures did not introduce substantial external contagion. Hence, analysis of controls indicates that centrifuging the auto claved SHI medium was a key step in the process to generate the in vitro dental plaque community. After confirming that centrifuging the culture media adequately decontaminated the controls, and that rarefaction made samples comparable, we performed Principal Coordinate Analysis on the relative abundances of all samples using Bray-Curtis metric. In other words, cultures and controls formed distinct clusters, with some liquid samples forming a third clusters and others liquid samples folding into the other two groups.

For both non-rarefied and rarefied samples, the first principal coordinate accounts for over 80% of the variation in the data while the second principal coordinate accounts for less than 10%. Clusters also appeared in roughly the same groupings with similar inter-group distances in both rarefied and non-rarefied cases, with unspiked liquids clustering on the upper right-hand side and cultures clustering on the lower right-hand side. Controls and spiked liquids clustered separately from the cultures and formed much looser groupings. An interesting feature of this plot is that spiked and unspiked cultures cluster closely with each other while unspiked liquids and controls do not cluster tightly with the spiked liquids and controls. In the non-rarefied samples, unspiked liquids sit to the right side of the first coordinate axis, more closely to the cultures than to the spiked liquids and controls. These clustering patterns reflect the effects of the E. coli spike-in more clearly than the bar plots, underlining the single-OTU dominance in low-biomass samples. As expected, E. coli spike-ins had little effect on the cultures, as evidenced by the tight clusters at essentially the same position in both non-rarefied and rarefied samples . Controls and liquids, on the other hand, had few bacterial cells and were therefore easily influenced by the presence of many tens or hundreds of millions of E. coli cells, thus forming distinct clusters separate from cultures. Here, the overriding of other cells by the E. coli OTU was clearly demonstrated in the dissimilarity between spiked liquids and unspiked liquids . The degree to which the spike-in affected groupings dictated that subsequent, more detailed analysis of sample compositions be performed after excluding spike-in OTUs.Quantitatively speaking, the relative abundance bar plot here shows that the volume of the E. coli spike-in is not numerically illuminating. The percentage of the Escherichia-Shigella OTU is not consistent across samples that received the same spike volume, probably because the number of oral bacterial cells could not be standardized to the same one across samples; and the percentage of this particular OTU does not consistently increase with increasing spikein volumes in either the cultures or the liquids. Although the number of cells in each aliquot of spike-in E. coli culture could be calculated based on the empirical relationship between CFU/ml and OD600 , we could not mathematically accurately connect the number of E. coli cells with either the read counts or the relative abundances of E. coli in the preliminary experiments. This internal standard would require that we standardize samples of extracted E. coli DNA to certain concentrations and use these DNA aliquots for spiking the library to be sequenced, much like what is done for the internal standard of the bacteriophage phi X 174 DNA. In that case, the fixed sequencing depth of the HTS procedure would lead to different depths across different sequencing runs, depending on the number of samples in each run. Furthermore, the PCR amplification process would need to be scrutinized and PCR efficiencies quantified for different primer sets, template sequences, mixtures of templates, and even different researchers. As much as the spike-in would have helped us gain a concrete sense of community composition, especially in terms of the more biologically relevant unit of cells rather than in units of DNA sequences, we realized that developing and optimizing such a process were beyond the scope of our intended goals as well as the time and resources available to us. Hence, we decided not to pursue this direction of research and simply used the spike-ins as a qualitative check for the incubation, extraction, and sequencing processes. Because the major OTU in the E. coli spike was obfuscating compositional features of the liquids, we removed this OTU from the rarefied read count data of cultures and liquids. Then, we transformed the counts into relative abundances and performed Principal Coordinate Analysis on these spike-less abundances. To examine the effects and potential artifacts of rarefaction on similarities across samples, we compared the clustering patterns of liquids and cultures before and after rarefaction to 20,000 reads. A quick comparison of Figure 15a with Figure 15c or Figure 15b with Figure 15d shows that this rarefaction process did not substantially change the x/y positions or clustering of either liquids or cultures .