A Phrase-based Statistical Model for SMS Text Normalization

A Phrase-based Statistical Model for SMS Text Normalization,Juan Xiao,Jian Su,Heng Mui,Keng Terrace

A Phrase-based Statistical Model for SMS Text Normalization  
BibTex | RIS | RefWorks Download
Short Messaging Service (SMS) texts be- have quite differently from normal written texts and have some very special phenom- ena. To translate SMS texts, traditional approaches model such irregularities di- rectly in Machine Translation (MT). How- ever, such approaches suffer from customization problem as tremendous ef- fort is required to adapt the language model of the existing translation system to handle SMS text style. We offer an alter- native approach to resolve such irregulari- ties by normalizing SMS texts before MT. In this paper, we view the task of SMS normalization as a translation problem from the SMS language to the English language 1 and we propose to adapt a phrase-based statistical MT model for the task. Evaluation by 5-fold cross validation on a parallel SMS normalized corpus of 5000 sentences shows that our method can achieve 0.80702 in BLEU score against the baseline BLEU score 0.6958. Another experiment of translating SMS texts from English to Chinese on a separate SMS text corpus shows that, using SMS normaliza- tion as MT preprocessing can largely boost SMS translation performance from 0.1926 to 0.3770 in BLEU score.
Cumulative Annual
View Publication
The following links allow you to view full publications. These links are maintained by other sources not affiliated with Microsoft Academic Search.