Sanitize input strings for use with ModString classes

Since the one letter nomenclature for RNA and DNA modification differs depending on the source, a translation to a common alphabet is necessary.

sanitizeInput exchanges based on a dictionary. The dictionary is expected to be a DataFrame with two columns, mods_abbrev and short_name. Based on the short_name the characters from in the input are converted from values of mods_abbrev into the the ones from alphabet.

Only different values will be searched for and exchanged.

sanitizeFromModomics and sanitizeFromtRNAdb use a predefined dictionary, which is builtin.

sanitizeInput(input, dictionary)

sanitizeFromModomics(input)

sanitizeFromtRNAdb(input)

Arguments

input: a character vector, which should be converted
dictionary: a DataFrame containing at least two columns mods_abbrev and short_name. From this a dictionary table is contructed for exchaning old to new letters.

Value

the modified character vector compatible for constructing a ModString object.

Examples

# Modomics
chr <- "AGC@"
# Error since the @ is not in the alphabet
if (FALSE) {
seq <- ModRNAString(chr)
}
seq <- ModRNAString(sanitizeFromModomics(chr))
seq
#> 4-letter ModRNAString object
#> seq: AGC÷

# tRNAdb
chr <- "AGC+"
# No error but the + has a different meaning in the alphabet
if (FALSE) {
seq <- ModRNAString(chr)
}
seq <- ModRNAString(sanitizeFromtRNAdb(chr))
seq
#> 4-letter ModRNAString object
#> seq: AGCΘ