The SequenceData
class is implemented to contain data on each position
along transcripts and holds the corresponding annotation data and
nucleotide sequence of these transcripts. To access this data several
functions
are available. The
SequenceData
class is a virtual class, from which specific classes can
be extended. Currently the following classes are implemented:
The annotation and sequence data can be accessed through the functions
ranges
and sequences
, respectively. Beaware, that the data is
always provided according to genomic positions with increasing
rownames
, but the sequence is given as the actual sequence of the
transcript. Therefore, it is necessary to treat the minus strand accordingly.
The SequenceData
class is derived from the
CompressedSplitDataFrameList
class
with additional slots for annotation and sequence data. Some functionality is
not inherited and might not available to full extend, e.g.relist
.
SequenceDataFrame
The SequenceDataFrame
class is a virtual class and contains data for
positions along a single transcript. In addition to being used for returning
elements from a SequenceData
object, the SequenceDataFrame class is
used to store the unlisted data within a
SequenceData
object. Therefore, a matching
SequenceData
and SequenceDataFrame
class must be implemented.
The SequenceDataFrame
class is derived from the
DataFrame
class.
Subsetting of a SequenceDataFrame
returns a SequenceDataFrame
or
DataFrame
, if it is subset by a column or row, respectively. The
drop
argument is ignored for column subsetting.
# S4 method for SequenceData
cbind(..., deparse.level = 1)
# S4 method for SequenceData
rbind(..., deparse.level = 1)
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for character,character
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for character,BSgenome
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for TxDb,character
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for TxDb,BSgenome
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for GRangesList,character
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for GRangesList,BSgenome
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for GFF3File,BSgenome
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for GFF3File,character
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for character,FaFile
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for GFF3File,FaFile
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for TxDb,FaFile
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
# S4 method for GRangesList,FaFile
SequenceData(dataType, bamfiles, annotation, sequences, seqinfo, ...)
Optional arguments overwriting default values. Not all
SequenceData
classes use all arguments. The arguments are:
minLength
single integer value setting a threshold for minimum
read length. Shorther reads are discarded (default: minLength = NA
).
maxLength
single integer value setting a threshold for maximum
read length. Longer reads are discarded (default: maxLength = NA
).
minQuality
single integer value setting a threshold for maximum
read quality. Reads with a lower quality are discarded (default:
minQuality = 5L
, but this is class dependent).
max_depth
maximum depth for pileup loading (default:
max_depth = 10000L
).
See base::cbind
for a
description of this argument.
The prefix for construction the class name of the
SequenceData
subclass to be constructed.
the input which can be of the following types
BamFileList
: a named BamFileList
character
: a character
vector, which must be coercible
to a named BamFileList
referencing existing bam files. Valid names are
control
and treated
to define conditions and replicates
annotation data, which must match the information contained in the BAM files.
sequences matching the target sequences the reads were mapped onto. This must match the information contained in the BAM files.
optional Seqinfo
to
subset the transcripts analyzed on a chromosome basis.
A SequenceData object
sequencesType
a character
value for the class name of
sequences
. Either RNAStringSet
, ModRNAStringSet
,
DNAStringSet
or ModDNAStringSet
.
minQuality
a integer
value describing a threshold of the minimum
quality of reads to be used.