Helper scripts
To simplify the translation process the po directory is present in the ivrd folder which contains several helper scripts.
- ivrd/po/xgetpo.sh - the script gathers all the phrases from all the python source files in the ivrd directory and its subfolders and merges all new changes into the ivrd/po/ivrd.pot file.
- ivrd/po/mergepo.sh - this script merges changes from the ivrd/po/ivrd.pot file into ivrd/po/${LANG}.po file. The script requires a parameter - two-letter language code.
- ivrd/compilepo.sh - the script compiles translated ivrd/po/${LANG}.po file into ssp/locale/${LANG}/LC_MESSAGES/ivrd.mo file. The script requires a parameter - two-letter language code.
- ivrd/prompt_utils.py - the script can make several useful tests and can gather some statistics on prompt sets
Preparing for translation to a new language
Imagine that the ivrd applications is to be translated into the Turkish language (language code is tr). This assumption will be used in the all the text below.
First of all the translation file is to be created. This can be achieved by running:
$ cd ivrd/po $ ./mergepo.sh tr |
This will create ivrd/po/tr.po file. This file is plain text file so it can be edited by any text editor.
Translating the translation file
Now when you have the tr.po the translation can be done. The file contains header like this:
msgid "" msgstr "" "Project-Id-Version: PACKAGE VERSION\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2007-08-23 12:44+0300\n" "PO-Revision-Date: 2007-08-23 12:44+0300\n" "Last-Translator: Automatically generated\n" "Language-Team: none\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=ASCII\n" "Content-Transfer-Encoding: 8bit\n" "Plural-Forms: nplurals=1; plural=0;\n" |
Note that you must change the Project-Id-Version value. Otherwise compiler will annoy you with warning.
The rest of the file is the phrases to translate. The phrases are of two forms - for simple phrases:
msgid "Some English text" msgstr "" |
and for plural forms:
msgid "There is one apple" msgid_plural "There are %n apples" msgstr[0] "" |
This was a quick overview of the file content so you can start translation. All the empty strings in the msgstr are to be filled with the translated phrases. You are free to use any encoding but do not forget to specify correct encoding name in the charset subfield of the Content-Type field.
Notes for translators
So here is the point when the phrases are to be passed to a human for translation. The notes for translators you can find here
TTS language module
To support the number into text, date into text, duration into text, etc conversions the language specific python module has to be created.
The module must be named by two-letter language code in uppercase. So for Turkish you have to create the TR.py file. This file is to be placed into ivrd/TextSynth directory. The existing language modules should be used to create new module. Here is a list of requirements to the module:
the _phrase_noop() function is to be defined and it has to convert your language specific phrases into UNICODE (unless the ASCII is sufficient). All the words and phrases must be encapsulated into _phrase_noop() calls. Also you cannot use any TTS features in this module to avoid infinite recursions.
When using non-ASCII encoding you must define it in the second line of the module:
#!/usr/local/bin/python # -*- coding: UTF-8 -*- |
These methods are to be created:
- sayNumber()
- sayDigits()
- sayDuration()
- sayDatetime() (this is used by the Voicemail app for now)
The TextSynth/__init__.py file is to be modified to support your new module.
The information to be obtained to create the module is summarized here.
Prompt creation
After the translations has been done and placed into the po/tr.po file the translation is to be compiled:
$ cd ivrd/po $ ./compilepo.sh tr |
Last thing to do is to create the prompt directory:
$ mkdir ~ssp/prompts/ivrd/tr |
Now the prompt_utils.py script can be used to generate the prompt list:
$ cd ivrd $ ./prompt_utils.py -l tr list unmapped |
This will create unmapped-tr.htmlfile containing all the phrase chunks in Turkish language and corresponding English phrases.
Here is the point where the narrator starts his work.
Registering prompts
After prompts have been recorded they are to be placed into the ssp/prompts/ivrd/tr folder in signed linear 16 bit 8000 Hz mono format and in g729-encoded format.
Then the ssp/prompts/ivrd/tr/prompt_map.txt file is to be created. The first line of the file must contain the encoding used to present phrase chunks and then prompt mappings follow:
# encoding: utf-8 file1|First phrase file2|second phrase |
After this you can run again the
./prompt_utils.py -l tr list unmapped |