Submitting data to SymPortal

Why submit my data to SymPortal and how is my data used?


Why submit my data?

Within the SymPortal analysis, a central principle is applied that enables SymPortal to distinguish between intra- and inter-genomic sources of ITS2 sequence variants. That principle is: the probability that a given set of ITS2 sequences found in a single coral sample are representative of a single Symbiodiniaceae genotype increases with the number of samples that set is found in. This principle is not new to SymPortal but rather has been used in DGGE-based methodologies for over 15 years. These gel-based methodologies have been able to successfully identify genetically distinct Symbiodiniaceae taxa that have been verified by additional genetic markers. So what does this have to do with my data?

Let’s look at an example. Let’s say you have 96 samples in your dataset, and each of the samples in your dataset contains the same Symbiodiniaceae taxa. When run through an analysis, SymPortal will identify the same reoccurring sets of ITS2 sequences in each of your samples. These sequences will be identified as defining intragenomic variants (DIVs) and used to define an ITS2 type profile, e.g. C3-C3a-C3cc. SymPortal is predicting that this set of sequences is representative of a Symbiodiniaceae taxon. It makes this prediction based on the central principle explained above. Essentially, because your dataset contains a sufficient number of samples, with the same reoccurring sets of sequences, you enable SymPortal to make this prediction. In contrast, let’s consider a case where your dataset contains Symbiodiniaceae taxa that only occur in a low number of your samples. In this case, SymPortal will be limited in its ability to resolve these taxa. For example, if two of your samples contain a Symbiodiniaceae taxon not found in other samples of your dataset, SymPortal will not be able to predict an accurate ITS2 type profile for this taxon due to its inability to identify a common set of sequences that reoccur in multiple samples. In this situation, a conservative ITS2 type profile will be assigned to each of these samples. However, by running your dataset against the remotely hosted SymPortal database, you enable SymPortal to search all previously submitted samples for sets of sequences that are also found in your samples. By this means, you may find sets of sequences that match your two samples, in samples already found in the database. As such, the more samples are contained in a database, the greater our power to resolve putative Symbiodiniaceae taxa is.

How is my data used?

Unpublished (and not publically available) data submitted to SymPortal will not be directly available to other users. However, immediately after submission, other users of SymPortal will be able to benefit from your data, just as you will be able to benefit from theirs. This is because during the SymPortal analysis the sets of sequences found in your samples are also searched for in all other previously submitted samples (as described in the Why submit my data section above). If your samples share ITS2 type profiles in common with other samples in the database you will be informed of this in your ITS2 type profile output count table. For each ITS2 type profile an ‘ITS2 type abundance local’ and an ‘ITS2 type abundance DB' are reported. The former, subtracted from the latter, will give you an indication of how many samples in the database, that were not part of your data set, contained the ITS2 type profile in question. No other information from non-user samples are returned.