The SequenceData class — SequenceData-class • RNAmodR

The SequenceData class is implemented to contain data on each position along transcripts and holds the corresponding annotation data and nucleotide sequence of these transcripts. To access this data several functions are available. The SequenceData class is a virtual class, from which specific classes can be extended. Currently the following classes are implemented:

CoverageSequenceData
End5SequenceData, End3SequenceData, EndSequenceData
NormEnd5SequenceData, NormEnd5SequenceData
PileupSequenceData
ProtectedEndSequenceData

The annotation and sequence data can be accessed through the functions ranges and sequences, respectively. Beaware, that the data is always provided according to genomic positions with increasing rownames, but the sequence is given as the actual sequence of the transcript. Therefore, it is necessary to treat the minus strand accordingly.

The SequenceData class is derived from the CompressedSplitDataFrameList class with additional slots for annotation and sequence data. Some functionality is not inherited and might not available to full extend, e.g.relist.

SequenceDataFrame

The SequenceDataFrame class is a virtual class and contains data for positions along a single transcript. In addition to being used for returning elements from a SequenceData object, the SequenceDataFrame class is used to store the unlisted data within a SequenceData object. Therefore, a matching SequenceData and SequenceDataFrame class must be implemented.

The SequenceDataFrame class is derived from the DataFrame class.

Subsetting of a SequenceDataFrame returns a SequenceDataFrame or DataFrame, if it is subset by a column or row, respectively. The drop argument is ignored for column subsetting.

# S4 method for SequenceData
cbind(..., deparse.level = 1)

# S4 method for SequenceData
rbind(..., deparse.level = 1)

SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)

# S4 method for character,character
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)

# S4 method for character,BSgenome
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)

# S4 method for TxDb,character
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)

# S4 method for TxDb,BSgenome
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)

# S4 method for GRangesList,character
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)

# S4 method for GRangesList,BSgenome
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)

# S4 method for GFF3File,BSgenome
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)

# S4 method for GFF3File,character
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)

# S4 method for character,FaFile
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)

# S4 method for GFF3File,FaFile
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)

# S4 method for TxDb,FaFile
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)

# S4 method for GRangesList,FaFile
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)

Arguments

...

Optional arguments overwriting default values. Not all SequenceData classes use all arguments. The arguments are:

minLength single integer value setting a threshold for minimum read length. Shorther reads are discarded (default: minLength = NA).
maxLength single integer value setting a threshold for maximum read length. Longer reads are discarded (default: maxLength = NA).
minQuality single integer value setting a threshold for maximum read quality. Reads with a lower quality are discarded (default: minQuality = 5L, but this is class dependent).
max_depth maximum depth for pileup loading (default: max_depth = 10000L).

deparse.level

See base::cbind for a description of this argument.

dataType

The prefix for construction the class name of the SequenceData subclass to be constructed.

bamfiles

the input which can be of the following types

BamFileList: a named BamFileList
character: a character vector, which must be coercible to a named BamFileList referencing existing bam files. Valid names are control and treated to define conditions and replicates

annotation

annotation data, which must match the information contained in the BAM files.

sequences

sequences matching the target sequences the reads were mapped onto. This must match the information contained in the BAM files.

seqinfo

optional Seqinfo to subset the transcripts analyzed on a chromosome basis.

Value

A SequenceData object

Slots

sequencesType: a character value for the class name of sequences. Either RNAStringSet, ModRNAStringSet, DNAStringSet or ModDNAStringSet.
minQuality: a integer value describing a threshold of the minimum quality of reads to be used.