vignettes/RNAmodR.creation.Rmd
RNAmodR.creation.RmdFor users interested in the general aspect of any
RNAmodR based package please have a look at the main vignette of the package.
This vignette is aimed at developers and researchers, who want to use
the functionality of the RNAmodR package to develop a new
modification strategy based on high throughput sequencing data.
Two classes have to be considered to establish a new analysis
pipeline using RNAmodR. These are the
SequenceData and the Modifier class.
SequenceData class
First, the SequenceData class has to be considered.
Several classes are already implemented, which are:
End5SequenceDataEnd3SequenceDataEndSequenceDataProtectedEndSequenceDataCoverageSequenceDataPileupSequenceDataNormEnd5SequenceDataNormEnd3SequenceDataIf these cannot be reused, a new class can be implemented quite
easily. First the DataFrame class, the Data class and a constructor has
to defined. The only value, which has to be provided, is a default
minQuality integer value and some basic information.
setClass(Class = "ExampleSequenceDataFrame",
contains = "SequenceDFrame")
ExampleSequenceDataFrame <- function(df, ranges, sequence, replicate,
condition, bamfiles, seqinfo){
RNAmodR:::.SequenceDataFrame("Example",df, ranges, sequence, replicate,
condition, bamfiles, seqinfo)
}
setClass(Class = "ExampleSequenceData",
contains = "SequenceData",
slots = c(unlistData = "ExampleSequenceDataFrame"),
prototype = list(unlistData = ExampleSequenceDataFrame(),
unlistType = "ExampleSequenceDataFrame",
minQuality = 5L,
dataDescription = "Example data"))
ExampleSequenceData <- function(bamfiles, annotation, sequences, seqinfo, ...){
RNAmodR:::SequenceData("Example", bamfiles = bamfiles,
annotation = annotation, sequences = sequences,
seqinfo = seqinfo, ...)
}Second, the getData function has to be implemented. This
is used to load the data from a bam file and must return a named list
IntegerList, NumericList or
CompressedSplitDataFrameList per file.
setMethod("getData",
signature = c(x = "ExampleSequenceData",
bamfiles = "BamFileList",
grl = "GRangesList",
sequences = "XStringSet",
param = "ScanBamParam"),
definition = function(x, bamfiles, grl, sequences, param, args){
###
}
)Third, the aggregate function has to be implemented.
This function is used to aggregate data over replicates for all or one
of the conditions. The resulting data is passed on to the
Modifier class.
Modifier class
A new Modifier class is probably the main class, which
needs to be implemented. Three variable have to be set. mod
must be a single element from the
Modstrings::shortName(Modstrings::ModRNAString()).
score is the default score, which is used for several
function. A column with this name should be returned from the
aggregate function. dataType defines the
SequenceData class to be used. dataType can
contain multiple names of a SequenceData class, which are
then combined to form a SequenceDataSet.
setClass("ModExample",
contains = c("RNAModifier"),
prototype = list(mod = "X",
score = "score",
dataType = "ExampleSequenceData"))
ModExample <- function(x, annotation, sequences, seqinfo, ...){
RNAmodR:::Modifier("ModExample", x = x, annotation = annotation,
sequences = sequences, seqinfo = seqinfo, ...)
}dataType can also be a list of
character vectors, which leads then to the creation of
SequenceDataList. However, for now this is a hypothetical
case and should only be used, if the detection of a modification
requires bam files from two or more different methods to be used to
detect one modification.
The settings<- function can be amended to save
specifc settings ( .norm_example_args must be defined
seperatly to normalize input arguments in any way one sees fit).
setReplaceMethod(f = "settings",
signature = signature(x = "ModExample"),
definition = function(x, value){
x <- callNextMethod()
# validate special setting here
x@settings[names(value)] <- unname(.norm_example_args(value))
x
})The aggregateData function is used to take the
aggregated data from the SequenceData object and to
calculate the specific scores, which are then stored in the
aggregate slot.
setMethod(f = "aggregateData",
signature = signature(x = "ModExample"),
definition =
function(x, force = FALSE){
# Some data with element per transcript
}
)The findMod function takes the aggregate data and
searches for modifications, which are then returned as a GRanges object
and stored in the modifications slot.
setMethod("findMod",
signature = c(x = "ModExample"),
function(x){
# an element per modification found.
}
)ModifierSet class
The ModifierSet class is implemented very easily by
defining the class and the constructor. The functionality is defined by
the Modifier class.
setClass("ModSetExample",
contains = "ModifierSet",
prototype = list(elementType = "ModExample"))
ModSetExample <- function(x, annotation, sequences, seqinfo, ...){
RNAmodR:::ModifierSet("ModExample", x = x, annotation = annotation,
sequences = sequences, seqinfo = seqinfo, ...)
}Additional functions, which need to be implemented, are
getDataTrack for the new SequenceData and new
Modifier classes and
plotData/plotDataByCoord for the new
Modifier and ModifierSet classes.
name defines a transcript name found in
names(ranges(x)) and type is the data type
typically found as a column in the aggregate slot.
setMethod(
f = "getDataTrack",
signature = signature(x = "ExampleSequenceData"),
definition = function(x, name, ...) {
###
}
)
setMethod(
f = "getDataTrack",
signature = signature(x = "ModExample"),
definition = function(x, name, type, ...) {
}
)
setMethod(
f = "plotDataByCoord",
signature = signature(x = "ModExample", coord = "GRanges"),
definition = function(x, coord, type = "score", window.size = 15L, ...) {
}
)
setMethod(
f = "plotData",
signature = signature(x = "ModExample"),
definition = function(x, name, from, to, type = "score", ...) {
}
)
setMethod(
f = "plotDataByCoord",
signature = signature(x = "ModSetExample", coord = "GRanges"),
definition = function(x, coord, type = "score", window.size = 15L, ...) {
}
)
setMethod(
f = "plotData",
signature = signature(x = "ModSetExample"),
definition = function(x, name, from, to, type = "score", ...) {
}
)If unsure, how to modify these functions, have a look a the code in
the Modifier-Inosine-viz.R file of this package.
As suggested directly above, for a more detailed example have a look
at the ModInosine class source code found in the
Modifier-Inosine-class.R and
Modifier-Inosine-viz.R files of this package.
## R version 4.5.1 (2025-06-13)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] RNAmodR_1.23.2 Modstrings_1.25.0 Biostrings_2.77.2
## [4] XVector_0.49.1 GenomicRanges_1.61.4 Seqinfo_0.99.2
## [7] IRanges_2.43.2 S4Vectors_0.47.2 BiocGenerics_0.55.1
## [10] generics_0.1.4 BiocStyle_2.37.1
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 rstudioapi_0.17.1
## [3] jsonlite_2.0.0 magrittr_2.0.4
## [5] GenomicFeatures_1.61.6 farver_2.1.2
## [7] rmarkdown_2.29 fs_1.6.6
## [9] BiocIO_1.19.0 ragg_1.5.0
## [11] vctrs_0.6.5 ROCR_1.0-11
## [13] memoise_2.0.1 Rsamtools_2.25.3
## [15] RCurl_1.98-1.17 base64enc_0.1-3
## [17] htmltools_0.5.8.1 S4Arrays_1.9.1
## [19] progress_1.2.3 curl_7.0.0
## [21] SparseArray_1.9.1 Formula_1.2-5
## [23] sass_0.4.10 bslib_0.9.0
## [25] htmlwidgets_1.6.4 desc_1.4.3
## [27] plyr_1.8.9 Gviz_1.53.1
## [29] httr2_1.2.1 cachem_1.1.0
## [31] GenomicAlignments_1.45.4 lifecycle_1.0.4
## [33] pkgconfig_2.0.3 Matrix_1.7-4
## [35] R6_2.6.1 fastmap_1.2.0
## [37] MatrixGenerics_1.21.0 digest_0.6.37
## [39] colorspace_2.1-2 AnnotationDbi_1.71.1
## [41] textshaping_1.0.3 Hmisc_5.2-3
## [43] RSQLite_2.4.3 filelock_1.0.3
## [45] colorRamps_2.3.4 httr_1.4.7
## [47] abind_1.4-8 compiler_4.5.1
## [49] bit64_4.6.0-1 htmlTable_2.4.3
## [51] S7_0.2.0 backports_1.5.0
## [53] BiocParallel_1.43.4 DBI_1.2.3
## [55] biomaRt_2.65.14 rappdirs_0.3.3
## [57] DelayedArray_0.35.3 rjson_0.2.23
## [59] tools_4.5.1 foreign_0.8-90
## [61] nnet_7.3-20 glue_1.8.0
## [63] restfulr_0.0.16 grid_4.5.1
## [65] checkmate_2.3.3 reshape2_1.4.4
## [67] cluster_2.1.8.1 gtable_0.3.6
## [69] BSgenome_1.77.2 ensembldb_2.33.2
## [71] data.table_1.17.8 hms_1.1.3
## [73] pillar_1.11.1 stringr_1.5.2
## [75] dplyr_1.1.4 BiocFileCache_2.99.6
## [77] lattice_0.22-7 deldir_2.0-4
## [79] rtracklayer_1.69.1 bit_4.6.0
## [81] biovizBase_1.57.1 tidyselect_1.2.1
## [83] knitr_1.50 gridExtra_2.3
## [85] bookdown_0.44 ProtGenerics_1.41.0
## [87] SummarizedExperiment_1.39.2 xfun_0.53
## [89] Biobase_2.69.1 matrixStats_1.5.0
## [91] stringi_1.8.7 UCSC.utils_1.5.0
## [93] lazyeval_0.2.2 yaml_2.3.10
## [95] evaluate_1.0.5 codetools_0.2-20
## [97] interp_1.1-6 tibble_3.3.0
## [99] BiocManager_1.30.26 cli_3.6.5
## [101] rpart_4.1.24 systemfonts_1.2.3
## [103] jquerylib_0.1.4 dichromat_2.0-0.1
## [105] Rcpp_1.1.0 GenomeInfoDb_1.45.11
## [107] dbplyr_2.5.1 png_0.1-8
## [109] XML_3.99-0.19 parallel_4.5.1
## [111] pkgdown_2.1.3 ggplot2_4.0.0
## [113] blob_1.2.4 prettyunits_1.2.0
## [115] latticeExtra_0.6-31 jpeg_0.1-11
## [117] AnnotationFilter_1.33.0 bitops_1.0-9
## [119] txdbmaker_1.5.6 VariantAnnotation_1.55.1
## [121] scales_1.4.0 crayon_1.5.3
## [123] rlang_1.1.6 KEGGREST_1.49.1