The SequenceData class is implemented to contain data on each position
along transcripts and holds the corresponding annotation data and
nucleotide sequence of these transcripts. To access this data several
functions are available. The
SequenceData class is a virtual class, from which specific classes can
be extended. Currently the following classes are implemented:
The annotation and sequence data can be accessed through the functions
ranges and sequences, respectively. Beaware, that the data is
always provided according to genomic positions with increasing
rownames, but the sequence is given as the actual sequence of the
transcript. Therefore, it is necessary to treat the minus strand accordingly.
The SequenceData class is derived from the
CompressedSplitDataFrameList class
with additional slots for annotation and sequence data. Some functionality is
not inherited and might not available to full extend, e.g.relist.
SequenceDataFrame
The SequenceDataFrame class is a virtual class and contains data for
positions along a single transcript. In addition to being used for returning
elements from a SequenceData object, the SequenceDataFrame class is
used to store the unlisted data within a
SequenceData object. Therefore, a matching
SequenceData and SequenceDataFrame class must be implemented.
The SequenceDataFrame class is derived from the
DataFrame class.
Subsetting of a SequenceDataFrame returns a SequenceDataFrame or
DataFrame, if it is subset by a column or row, respectively. The
drop argument is ignored for column subsetting.
# S4 method for class 'SequenceData'
cbind(..., deparse.level = 1)
# S4 method for class 'SequenceData'
rbind(..., deparse.level = 1)
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for class 'character,character'
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for class 'character,BSgenome'
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for class 'TxDb,character'
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for class 'TxDb,BSgenome'
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for class 'GRangesList,character'
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for class 'GRangesList,BSgenome'
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for class 'GFF3File,BSgenome'
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for class 'GFF3File,character'
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for class 'character,FaFile'
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for class 'GFF3File,FaFile'
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for class 'TxDb,FaFile'
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for class 'GRangesList,FaFile'
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)Optional arguments overwriting default values. Not all
SequenceData classes use all arguments. The arguments are:
minLength: single integer value setting a threshold for minimum
read length. Shorther reads are discarded (default: minLength = NA).
maxLength: single integer value setting a threshold for maximum
read length. Longer reads are discarded (default: maxLength = NA).
minQuality: single integer value setting a threshold for maximum
read quality. Reads with a lower quality are discarded (default:
minQuality = 5L, but this is class dependent).
max_depth: maximum depth for pileup loading (default:
max_depth = 10000L).
See base::cbind for a
description of this argument.
The prefix for construction the class name of the
SequenceData subclass to be constructed.
the input which can be of the following types
BamFileList: a named BamFileList
character: a character vector, which must be coercible
to a named BamFileList referencing existing bam files. Valid names are
control and treated to define conditions and replicates
annotation data, which must match the information contained in the BAM files.
sequences matching the target sequences the reads were mapped onto. This must match the information contained in the BAM files.
optional Seqinfo to
subset the transcripts analyzed on a chromosome basis.
A SequenceData object
sequencesTypea character value for the class name of
sequences. Either RNAStringSet, ModRNAStringSet,
DNAStringSet or ModDNAStringSet.
minQualitya integer value describing a threshold of the minimum
quality of reads to be used.