A ModDNAString
object allows DNA sequences with modified nucleotides
to be stored and manipulated.
ModDNAString(x = "", start = 1, nchar = NA)
the input as a character
.
the postion in the character vector to use as start position in
the ModDNAString
object (default start = 1
).
the width of the character vector to use in the
ModDNAString
object (default nchar = NA
). The end position is
calculated as start + nchar - 1
.
a ModDNAString
object
The ModDNAString class contains the virtual ModString
class,
which is itself based on the XString
class. Therefore, functions for working with XString
classes are
inherited.
The alphabet
of the ModDNAString class consist of the
non-extended IUPAC codes "A,G,C,T,N", the gap letter "-", the hard masking
letter "+", the not available letter "." and letters for individual
modifications: alphabet(ModDNAString())
.
Since the special characters are encoded differently depending on the OS and
encoding settings of the R session, it is not always possible to enter a DNA
sequence containing modified nucleotides via the R console. The most
convinient solution for this problem is to use the function
modifyNucleotides
and modify and existing DNAString or
ModDNAString object.
A ModDNAString
object can be converted into a DNAString
object
using the DNAstring()
constructor. Modified nucleotides are
automaitcally converted intro their base nucleotides.
If a modified DNA nucleotide you want to work with is not part of the alphabet, please let us know.
# Constructing ModDNAString containing an m6A
md1 <- ModDNAString("AGCT`")
md1
#> 5-letter ModDNAString object
#> seq: AGCT`
# the alphabet of the ModDNAString class
alphabet(md1)
#> [1] "A" "C" "G" "T" "N" "-" "+" "." "p" "δ" "O" "]" "D" "J" "e"
#> [16] "g" "`" "b" "U" "∝" "π" "I" "7" "6" "3" "2" "1" "8" "∉" "⊆"
#> [31] "⊇" "R" "α" "m" "h" "×" "f" "4" "ν" "X" "'" "κ" "o" "(" ")"
#> [46] "η" "a" "⇓" "⇑" "\"" "√" "/" "≡" "ζ" "~"
# due to encoding issues the shortNames can also be used
shortName(md1)
#> [1] "5pT" "3mT" "O4meT" "1mT" "dhT" "baseJ" "5fU" "5hmU"
#> [9] "dhpU" "5caU" "dU" "DHdU" "5nmU" "dI" "7mG" "6mG"
#> [17] "3mG" "2mG" "1mG" "8oxoG" "7a7dG" "12eG" "23eG" "2,2mG"
#> [25] "2ceG" "5mC" "5hmC" "5gmC" "5fC" "5caC" "4mC" "34eC"
#> [33] "3mC" "3eC" "2mC" "1mC" "9mA" "7mA" "6mA" "4mA"
#> [41] "3mA" "1mA" "6haA" "2mA" "2a6haA" "6,6mA" "6ncmA"
# due to encoding issues the nomenclature can also be used
nomenclature(md1)
#> [1] "5pT" "3mT" "O4meT" "1mT" "dhT" "baseJ" "5fU" "5hmU"
#> [9] "dhpU" "5caU" "dU" "DHdU" "5nmU" "dI" "7mG" "6mG"
#> [17] "3mG" "2mG" "1mG" "8oxoG" "7a7dG" "12eG" "23eG" "2,2mG"
#> [25] "2ceG" "5mC" "5hmC" "5gmC" "5fC" "5caC" "4mC" "34eC"
#> [33] "3mC" "3eC" "2mC" "1mC" "9mA" "7mA" "6mA" "4mA"
#> [41] "3mA" "1mA" "6haA" "2mA" "2a6haA" "6,6mA" "6ncmA"
# convert to DNAString
d1 <- DNAString(md1)
d1
#> 5-letter DNAString object
#> seq: AGCTT