R/Modstrings-sanitize.R
sanitizeInput.Rd
Since the one letter nomenclature for RNA and DNA modification differs depending on the source, a translation to a common alphabet is necessary.
sanitizeInput
exchanges based on a dictionary. The dictionary is
expected to be a DataFrame
with two columns, mods_abbrev
and
short_name
. Based on the short_name
the characters from in the
input are converted from values of mods_abbrev
into the the ones
from alphabet
.
Only different values will be searched for and exchanged.
sanitizeFromModomics
and sanitizeFromtRNAdb
use a predefined
dictionary, which is builtin.
sanitizeInput(input, dictionary)
sanitizeFromModomics(input)
sanitizeFromtRNAdb(input)
a character
vector, which should be converted
a DataFrame containing at least two columns
mods_abbrev
and short_name
. From this a dictionary table is
contructed for exchaning old to new letters.
the modified character
vector compatible for constructing a
ModString
object.
# Modomics
chr <- "AGC@"
# Error since the @ is not in the alphabet
if (FALSE) {
seq <- ModRNAString(chr)
}
seq <- ModRNAString(sanitizeFromModomics(chr))
seq
#> 4-letter ModRNAString object
#> seq: AGC÷
# tRNAdb
chr <- "AGC+"
# No error but the + has a different meaning in the alphabet
if (FALSE) {
seq <- ModRNAString(chr)
}
seq <- ModRNAString(sanitizeFromtRNAdb(chr))
seq
#> 4-letter ModRNAString object
#> seq: AGCΘ