ambiguity to a t()

The confusion comes in when at almost every corner we Drupal developers are told to use t() to help secure user input. Whenever the issue of cleansing user data comes up, the t()function is mention. But then we read the comment statements in common.inc or the online API documentation and learn that the official word is to not send user input through t() and that it would hurt a sites performance to do so. Well, both security and performance are things to strive for, but they don't always mix well. So let's be clear about the issues here. There are times to use the t() function but there are times when other functions would be better.

What is the T() function?

To recap t() for those who don't know: It is a Drupal function that provides a way to allow common phrases in a Drupal installation to be translated into other languages. However, it has some data cleansing built in so it is advised by many that you should use it whenever you output user data.

The idea is to make the site's infra-structure multi-lingual. The basic syntax is that string that is to be output is masked like the 2nd line below (the 1st line being the PHP line it replaces:

$output .= t("you are reading something");

Now the phrase "you are reading something" can be translated into other languages and change on your site with the language chosen. T() doesn't provide for content translation, it just affects the sites interface.

The cleansing abilities come in to play to handle parts of the string that might involve untrusted information.


insecure code:

echo "you are " . $node->title;


secure code:

echo t("you are reading @title", array("@title",$node->title);

In the first case, if someone somehow snuck some malware into the nodes title, then you would be open to a Cross-site-scripting or cross-domain-request-forgery. There are other problems with this code, but we'll focus on the security issues for now.

To handle situations like this, the t() function gives you a way to pass information a function to clean it first. Notice that the first string in the t() function includes a word that is equal to a key in the array that comes next. That word, known as the "placeholder", will be replaced by the value of the array element. This is how data is sent to the t() function.

You also notice that the first letter in the placeholder's name is an at-sign (@). That signals the function to send the value through the check_plain() function before it is output. Instead of an at-sign, an exclamation-point (!) tells t() to not do anything and a percent-sign (%) tells t() to both clean it and wrap it in a ... tag. The problem is that according to the documentation inside the code itself, you should not put user input through t(). It's concern is that the localization system will become clogged with a bunch of little trivial pieces of data.

So now, what are we Drupal developers supposed to do? What are the limits of these functions or are they omnipresent in the Drupal universe? The t() function is considered so sacred to many that questioning its use is here-say and will get an automatic response from many. So how about some practical guidelines instead of rhetoric that is being repeated without any thought rewritten up to this point.

What data is untrusted

Earlier I referred to "untrusted" data. When is data untrusted? In one sentence: when the data can't safely be output. The only information that should be displayed without cleaning is data we ourselves set in that program. If the data has ever gone through something like HTTP headers or the SQL database then you want to clean it. Whether it comes from options passed between programs via a URI, Drupal variables, or a database then you can't trust it. The problem is simple when it comes to HTTP options. All it would take is for someone to point their browser to your page with the HTTP option loaded with malware. How about if you put the value in a cookie? Then malware can't be passed in the URI, but there are other problems. Cookies can be edited so it would be easy to load the cookie with malware.

What if you've got a function that is only called by another function. Can the variables in the function line be safe? What if a hole was found in another module. One which allows the user to enter that string with a cross-site-scripting attack. So when modules like your module outputs a string without a second thought you actually are doing the attack that was launched modules ago. So even if you think a string is already clean you are relying on the other programmer of the other module to do things 100% correct and anticipate every possible option in the future. While we need to try to anticipate the future, we can't afford to rely on that being done in the past. This isn't the end of the list of possible problems, but enough to give you the scope of the problem.

Again, the problem is any data that is not 100% in your control.

If you can reasonably assume that the data is a member of a small set of values then go ahead and pass it through t(). If the data has no guarantees about what it might be, then send it through check_plain(). You still need to clean the data but there is no reason to tax the system when doing so.

Let's look at some examples now.

function example_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL){
echo "you have entered the $op option"; }

What if another module used user input for the 2nd parameter or the administrator of the site turned the php_filter on? In both cases example_nodeapi() could be called with a contaminated string for the 2nd parameter.


A secure version of the case above:

class="code"> function hook_nodeapi(&$node, $op, $a3 = NULL, $a4 = NULL) { echo "you have entered the "; echo t("%holder", array("%holder" => $op)); echo "option"; }

An insecure version of data from a large or unknown set:

function foo_bar (){ echo "thank you for reading " . $node->title ; }

Now what if the contaminated string is in a nodes title?


A secure version of the second case (data from a large or unknown set):

function foo_bar (){ echo "thank you for reading " . clean_plain($node->title); }

Now that we know what t() is and why output needs to be cleansed, let's get back to the main subject about when is it appropriate to cleanse data with t().

According to the API documentation online:

However tempting it is, custom data from user input or other non-code sources should not be passed through t(). Doing so leads to the following problems and errors: The t() system doesn't support updates to existing strings. When user data is updated, the next time it's passed through t() a new record is created instead of an update. The database bloats over time and any existing translations are orphaned with each update. The t() system assumes any data it receives is in English. User data may be in another language, producing translation errors. The "Built-in interface" text group in the locale system is used to produce translations for storage in .po files. When non-code strings are passed through t(), they are added to this text group, which is rendered inaccurate since it is a mix of actual interface strings and various user input strings of uncertain origin.

So basically, one side is saying to use it when you need to clean user data, and the other is saying to only use it for translating static strings that the program will generate. I've asked people at the forum and have only gotten canned answers from one of the two camps. Nothing in between and nothing that could help make real decisions. My take puts me in the second camp, but I'll say to use check_plain() when cleansing user data. what else do I need to know?

More guidelines for t()

  • Avoid putting HTML in the string except in links.
    bad: $output .= t( '<p>Go to the @contact-page.</p>', array( '@contact-page' => l(t('contact page'), 'contact')));
    good: $output .= '<p>'. t( 'Go to the <a href="@contact-page">contact page</a>.', array('@contact-page' => url('contact'))) . '</p>';
  • Avoid escaping quotation marks wherever possible.
    bad: drupal_set_message(t("this isn\'t real"));
    good: drupal_set_message(t("this isn't real"));
  • Use a full sentence whenever possible.
    bad: drupal_set_message(t("this").t("isn't").t("real"));
    good: drupal_set_message(t("this isn't real"));
  • Always use the exact string and not a variable.
    bad: $message="this is fake";Drupal_set_message(t($message));
    good: drupal_set_message(t("this is fake"));

With just a little bit of thought, security and performance doesn't have to clash. We can write secure code without clogging the system. Hopefully, we can stop programmers from using performance as an excuse to circumvent security.

Comments

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Image links from G2 are formatted for use with Lightbox2

More information about formatting options