Codex tools: Log in / create account
Contents |
Internationalization and localization are terms used to describe the effort to make WordPress (and other such projects) available in languages other than English, for people from different locales, who use different dialects and local preferences.
The process of localizing a program is two-fold. It begins on the developer's end, where the developers provide a mechanism and method for the eventual translation of the program and it's interface to suit local preferences and languages for users worldwide. It ends with localization, the process by which the script is translated and adapted to another language and culture, using the framework prescribed by the developers of the software.
Now that you have a basic understanding of how this works, let's look at how to use the localization tools to teach your WordPress installation "to speak" in languages other than the default English.
WordPress is used all around the world, and it is with the help of translators that it has achieved such popularity in the international community. This article explains how bi- or multi-lingual WordPress users can go about localizing WordPress to more languages.
Before you dive into translating WordPress, check that a localization for your language and locale doesn't already exist. There are many teams of translators already maintaining localizations for various languages, and one of them could be yours.
If there are .mo language files available in your language, see Installing WordPress in Your Language for more information.
If you don't see your language in the official WordPress localization repository, your best bet for getting started is to subscribe to the wp-polyglots mailing list, introduce yourself, and ask if there's anyone translating in your language and locale.
A locale is a combination of language and regional dialect. Usually locales correspond to countries, as is the case with Portuguese (Portugal) and Portuguese (Brazil).
You can do a translation for any locale you wish, even other English locales such as Canadian English or Australian English, to adjust for regional spelling and idioms.
The default locale of WordPress is U.S. English.
Wordpress uses the GNU Gettext localization framework. Gettext is a mature, widely used framework for modular translation of software. It is the de facto standard for localization in the open source/free software realm.
Gettext uses message-level translation — that is, every "message" displayed to users is translated individually, whether it be a paragraph or a single word. Throughout WordPress, all messages are passed through two functions:
__() is used when the message is passed as an argument to another function; _e() is used to write the message directly to the page.
There are three types of files used in the Gettext translation framework. Depending on the tool you use to translate, you will need to be familiar with some or all of these files.
There are various tools available to aid in translating. You may use whichever you prefer.
When you are finished with your translation, you can export it as a PO or MO file if you like, by visiting the PO File Details page for your language and clicking on Export as PO file or Export as MO file.
This section is incomplete.
At the beginning of the PO file is something called the header. This gives information about what package and version the translation is for, who the translator was, and when it was created. Certain portions of this header should be universal for all WordPress translations:
# LANGUAGE (LOCALE) translation for WordPress. # Copyright (C) YEAR WordPress contributors. # This file is distributed under the same license as the WordPress package. # FIRST AUTHOR <EMAIL@ADDRESS>, YEAR. # #, fuzzy msgid "" msgstr "" "Project-Id-Version: WordPress VERSION\n" "Report-Msgid-Bugs-To: \n" "POT-Creation-Date: 2005-02-27 17:11-0600\n" "PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n" "Last-Translator: FULL NAME <EMAIL@ADDRESS>\n" "Language-Team: LANGUAGE <LL@li.org>\n" "MIME-Version: 1.0\n" "Content-Type: text/plain; charset=CHARSET\n" "Content-Transfer-Encoding: 8bit\n"
Fill in the rest of the capitalized text with the appropriate values.
The remainder of the file will be in a format as follows:
#: wp-comments-post.php:13 msgid "Sorry, comments are closed for this item." msgstr "" #: wp-comments-post.php:29 msgid "Sorry, you must be logged in to post a comment." msgstr "" #: wp-comments-post.php:35 msgid "Error: please fill the required fields (name, email)." msgstr ""
The first line of each message contains the location of the message in the WordPress code. In the case of these messages, they're all located in wp-comments-post.php, on lines 13, 29, and 35, respectively. Occasionally you will come across a message for which you will need to check its context; look at the appropriate line or lines in the WordPress core, and you should be able to figure out when and where the message is displayed, and even reproduce it yourself using your web browser.
The next line, msgid, is the source message. This is the string that WordPress passes to its __() or _e() functions, and the message you will need to translate.
The final line, msgstr, is a blank string where you will fill in your translation.
Here's how the same few lines would look after being translated, using the French (France) locale as an example:
#: wp-comments-post.php:13 msgid "Sorry, comments are closed for this item." msgstr "Désolé, les commentaires sont fermés pour cet article." #: wp-comments-post.php:29 msgid "Sorry, you must be logged in to post a comment." msgstr "Désolé, vous devez être connecté pour rédiger un commentaire." #: wp-comments-post.php:35 msgid "Error: please fill the required fields (name, email)." msgstr "Erreur : veuillez renseigner les champs obligatoires vides (nom, e-mail)."
Labels are often used in the context of HTML <label>, <legend>, <a>, or <select> tags. They are short and precise descriptors of the purpose of a UI element. These can be very difficult to translate at times, especially if they are single words, and if the word used in English can be interpreted as either a noun or imperative verb. With most labels you will need to do some searching through the code to find the context of its use before coming up with an appropriate translation.
Because so many of the messages are part of the WordPress administration interface, Labels are probably the most frequent type of message to translate.
msgid "Post" msgstr "Artikkeli"
"Post" could be interpreted as an imperative verb, but in this context it's a noun. The noun form of "post" in English can be difficult to translate, and the most appropriate translation has been difficult for some teams to decide upon. Many translations use their language's equivalent to the English "Article," as this one does. (From the Finnish (Finland) translation.)
#: wp-login.php:79 wp-login.php:233 wp-register.php:166 #: wp-includes/template-functions-general.php:46 msgid "Register" msgstr "रजिस्टर"
From the Hindi translation.
#: wp-admin/admin-functions.php:357 msgid "- Select -" msgstr " - Dewis -"
Items like the surrounding dashes in this example can be eliminated or replaced if they might be confusing to users in your target locale, or if there are different established conventions for your locale. From the Welsh translation.
Another frequent type of message, the informational message is usually composed of full sentences, and conveys information or requests an action of the user. Since these tend to be longer than labels, they tend to be slightly easier to translate. However, with the longer messages comes more variation in the level of formality (or informality), which is something translators need to be aware of.
#: wp-login.php:146 msgid "Your new password is in the mail." msgstr "Вашата нова парола е в електронната ви поща."
This particular message contains a modified English formulaic expression ("the check/cheque is in the mail"), which contributes to its informality. (From the Bulgarian (Bulgaria) translation.)
#: wp-includes/functions.php:1636 msgid "<strong>Error</strong>: Incorrect password." msgstr "<strong>FEL</strong>: Felaktigt lösenord."
Error messages tend to be more formal, simply because they're short and concise. (From the Swedish (Sweden) translation.)
#: wp-includes/functions-post.php:467 msgid "Sorry, you can only post a new comment once every 15 seconds. Slow down cowboy." msgstr "Leider kannst du nur alle 15 Sekunden einen neuen Kommentar eingeben. Immer locker bleiben."
Of course, not all of them. (From the German (Germany) translation.)
Rather than using PHP's built-in locale switching features, which on is not configured for very many languages on most hosts, WordPress uses the same Gettext translation module to accomplish date and time translations and formatting.
WordPress translates the following:
#: wp-includes/locale.php:42 wp-includes/locale.php:57 msgid "May" msgstr "Květen"
(From the Czech (Czech Republic) translation.)
#: wp-includes/locale.php:57 msgid "May_May_abbreviation" msgstr "Mag"
Note the unusual msgid. These messages should NOT be translated literally: they are a hack to get around the fact that in English, the full name and abbreviation for May are the same, which Gettext would erroneously combine into one entry. (From the Italian (Italy) translation.)
#: wp-includes/locale.php:7 #: wp-includes/locale.php:18 #: wp-includes/locale.php:31 msgid "Tuesday" msgstr "火曜日"
(From the Japanese (Japan) translation.)
#: wp-includes/locale.php:31 msgid "Tue" msgstr "Уто"
(From the Serbian (Serbia & Montenegro) (Cyrillic) translation.)
#: wp-includes/locale.php:18 msgid "T_Tuesday_initial" msgstr "ti"
The weekday initials are for WordPress's calendar feature, and use the same hack as the month abbreviations to get around the fact that in English Tuesday and Thursday share the same first letter. Not all locales use single-letter abbreviations for all days: in this example, Norwegian Bokmål uses an extra letter to distinguish tirsdag (Tuesday) and torsdag (Thursday). (From the Norwegian Bokmål (Norway) translation.)
These are PHP date() formatting strings, and they allow you to change the formatting of the date and time for your locale.
WordPress uses the translations elsewhere in the localization file for month names, weekday names, etc. This special string is for the selection of which elements to include in the date & time, as well as the order in which they're presented.
Take this msgid from the theme.pot file:
#: archive.php:40 search.php:19 single.php:22 msgid "l, F jS, Y" msgstr ""
In English, this gets formatted as:
Sunday, February 27th, 2005
However, different locales format their dates differently. In Danish, for example, dates are written:
søndag, 27. februar 2005
To accomplish this, the msgid above would be translated to:
#: archive.php:40 search.php:19 single.php:22 msgid "l, F jS, Y" msgstr "l, j. F Y"
To use another example, one way to format dates in Chinese and Japanese is as follows:
2005年2月27日
This would be accomplished in the translation like this:
#: archive.php:40 search.php:19 single.php:22 msgid "l, F jS, Y" msgstr "Y年n月j日"
Lastly, if you need to include literal alphabetic characters in your date format, as sometimes occurs in Spanish, you can backslash them:
#: archive.php:40 search.php:19 single.php:22 msgid "l, F jS, Y" msgstr "l j \d\e F \d\e Y "
This would output:
domingo 27 de febrero de 2005
Many messages contain special PHP formatting placeholders, which allow the insertion of untranslatable dynamic content into the message after it is translated. The PHP placeholders come in two different formats:
#: wp-login.php:116 msgid "The e-mail was sent successfully to %s's e-mail address." msgstr "El e-mail fue enviado satisfactoriamente a la dirección e-mail de %s"
This message inserts the username of the user to which an email has been sent. (From the Spanish (Spain) translation.)
#: wp-admin/upload.php:96 #, php-format msgid "File %1$s of type %2$s is not allowed." msgstr "类型为%2$s的文件%1$s不允许被上传。"
This message reverses the order in which the file name and type are used in the translation. (From the Chinese (China) translation.)
The WordPress Localization Repository is a Subversion repository where official WordPress translations are maintained. Various teams collaborate on translations for their native language, and team maintainers commit updates and changes to the repository.
Participation in the repository is open to anyone. Simply subscribe to the wp-polyglots mailing list, introduce yourself, and let everyone know what translation you'd like to work on. If there is already a team for your language and locale, they'll let you know and you can join them. If not, you can either volunteer to be a maintainer for your language and locale, or simply contribute your localization and the repository maintainers will add it.
Note: these guidelines are subject to change as the system evolves; repository maintainers will be happy to assist you in updating the files you maintain in the repository should these guidelines change.
All localizations should have at least a UTF-8 version, but may optionally add versions in other character encodings popular for that locale.
PHP does not support Byte Order Markers (BOMs), so be sure the UTF-8 encoded files you contribute do not have them.
With a few exceptions (noted below), all translations should be written literally, rather than escaping accented and special characters with HTML character entities.
Some characters must be escaped to avoid conflict with XHTML markup: angle brackets (< and >), and ampersands (&). In addition, there are a few other characters better used escaped, such as non-breaking spaces ( ), angle quotes (« and »), curly apostrophes (’) and curly quotes.
For more information about the W3C's best practices involving character encodings and character entities, see the following references:
The repository contains directories for each locale, which are named as follows:
Within each locale's directory are the regular Subversion versioning directories: branches/, tags/, and trunk/.
Inside the appropriate versioning directory are the following subdirectories:
dist/
This directory contains all files in the WordPress distribution that cannot be Gettexted, which have been translated into the target locale.
If the locale has only a UTF-8 translation of the files, the dist/ directory may be populated with them directly, and the structure within dist should mirror the structure of the wordpress root directory:
If the locale contains more than just a UTF-8 character encoding, then dist/ should contain subdirectories for each encoding:
messages/
This directory contains the Gettext MO and PO files for the locale. Message files are named as follows:
Examples:
cs_CZ.po # UTF-8 character encoding ja_JP.EUC-JP.mo # EUC-JP character
theme/
Similarly to the dist/ dir, theme/ contains hard-translated theme files. If only a UTF-8 translation is present, the directory can be populated with subdirectories for each theme translated. These subdirectories contain all of the same files as the original theme (except that they're translated), and are named the same as the original theme:
Just as with the dist/ directory, if there are multiple character encodings represented, theme/ should contain a subdirectory for each character encoding, which in turn would contain subdirectories for each theme translated.