A ModDNAString object allows DNA sequences with modified nucleotides to be stored and manipulated.

ModDNAString(x = "", start = 1, nchar = NA)

Arguments

x

the input as a character.

start

the postion in the character vector to use as start position in the ModDNAString object (default start = 1).

nchar

the width of the character vector to use in the ModDNAString object (default nchar = NA). The end position is calculated as start + nchar - 1.

Value

a ModDNAString object

Details

The ModDNAString class contains the virtual ModString class, which is itself based on the XString class. Therefore, functions for working with XString classes are inherited.

The alphabet of the ModDNAString class consist of the non-extended IUPAC codes "A,G,C,T,N", the gap letter "-", the hard masking letter "+", the not available letter "." and letters for individual modifications: alphabet(ModDNAString()).

Since the special characters are encoded differently depending on the OS and encoding settings of the R session, it is not always possible to enter a DNA sequence containing modified nucleotides via the R console. The most convinient solution for this problem is to use the function modifyNucleotides and modify and existing DNAString or ModDNAString object.

A ModDNAString object can be converted into a DNAString object using the DNAstring() constructor. Modified nucleotides are automaitcally converted intro their base nucleotides.

If a modified DNA nucleotide you want to work with is not part of the alphabet, please let us know.

Examples

# Constructing ModDNAString containing an m6A
md1 <- ModDNAString("AGCT`")
md1
#> 5-letter ModDNAString object
#> seq: AGCT`

# the alphabet of the ModDNAString class
alphabet(md1)
#>  [1] "A"  "C"  "G"  "T"  "N"  "-"  "+"  "."  "p"  "δ"  "O"  "]"  "D"  "J"  "e" 
#> [16] "g"  "`"  "b"  "U"  "∝"  "π"  "I"  "7"  "6"  "3"  "2"  "1"  "8"  "∉"  "⊆" 
#> [31] "⊇"  "R"  "α"  "m"  "h"  "×"  "f"  "4"  "ν"  "X"  "'"  "κ"  "o"  "("  ")" 
#> [46] "η"  "a"  "⇓"  "⇑"  "\"" "√"  "/"  "≡"  "ζ"  "~" 
# due to encoding issues the shortNames can also be used
shortName(md1)
#>  [1] "5pT"    "3mT"    "O4meT"  "1mT"    "dhT"    "baseJ"  "5fU"    "5hmU"  
#>  [9] "dhpU"   "5caU"   "dU"     "DHdU"   "5nmU"   "dI"     "7mG"    "6mG"   
#> [17] "3mG"    "2mG"    "1mG"    "8oxoG"  "7a7dG"  "12eG"   "23eG"   "2,2mG" 
#> [25] "2ceG"   "5mC"    "5hmC"   "5gmC"   "5fC"    "5caC"   "4mC"    "34eC"  
#> [33] "3mC"    "3eC"    "2mC"    "1mC"    "9mA"    "7mA"    "6mA"    "4mA"   
#> [41] "3mA"    "1mA"    "6haA"   "2mA"    "2a6haA" "6,6mA"  "6ncmA" 
# due to encoding issues the nomenclature can also be used
nomenclature(md1) 
#>  [1] "5pT"    "3mT"    "O4meT"  "1mT"    "dhT"    "baseJ"  "5fU"    "5hmU"  
#>  [9] "dhpU"   "5caU"   "dU"     "DHdU"   "5nmU"   "dI"     "7mG"    "6mG"   
#> [17] "3mG"    "2mG"    "1mG"    "8oxoG"  "7a7dG"  "12eG"   "23eG"   "2,2mG" 
#> [25] "2ceG"   "5mC"    "5hmC"   "5gmC"   "5fC"    "5caC"   "4mC"    "34eC"  
#> [33] "3mC"    "3eC"    "2mC"    "1mC"    "9mA"    "7mA"    "6mA"    "4mA"   
#> [41] "3mA"    "1mA"    "6haA"   "2mA"    "2a6haA" "6,6mA"  "6ncmA" 

# convert to DNAString
d1 <- DNAString(md1)
d1
#> 5-letter DNAString object
#> seq: AGCTT