2 Statistical MT Preliminaries

First, before talking about any speci c models, this chapter describes the overall framework of statistical machine translation (SMT) [16] more formally.

First, we de ne our task of machine translation as translating a source sentence F =

f1; : : : ; fJ = fjFj1 into a target sentence E = e1; : : : ; eI = ejEj1 .Thus, any type of translation system can be de ned as a function

^E = mt(F); (1)


which returns a translation hypothesis ^E given a source sentence F as input.

Statistical machine translation systems are systems that perform translation by creating a probabilistic model for the probability of E given F, P(E j F; ), and nding the target sentence that maximizes this probability:


^E= argmaxEP(E j F; ); (2)


where are the parameters of the model specifying the probability distribution. The parameters are learned from data consisting of aligned sentences in the source and target languages, which are called parallel corpora in technical terminology.Within this framework, there are three major problems that we need to handle appropriately in order to create a good translation system:


Modeling: First, we need to decide what our model P(E j F; ) will look like. What parameters will it have, and how will the parameters specify a probability distribution?


Learning: Next, we need a method to learn appropriate values for parameters from training data.


Search: Finally, we need to solve the problem of nding the most probable sentence (solving \argmax"). This process of searching for the best hypothesis and is often called decoding.1


The remainder of the material here will focus on solving these problems.

+ Recent posts