WordPress.org

Codex

Validating Sanitizing and Escaping User Data

Contents

Overview

Your code works, but is it safe? When writing code that will run across hundreds if not thousands of websites, you should be extra cautious of how you handle data coming into WordPress and how it's then presented to the end user. This commonly comes up when building a settings page for your theme, creating and manipulating shortcodes, or saving and rendering extra data associated with a post.

There's a distinction between how input and output are managed, however.

Validating: Checking User Input

To validate is to ensure the data you've requested of the user matches what they've submitted. There are several core methods you can use for input validation; usage obviously depends on the type of fields you'd like to validate. Let's take a look at an example.

Say we have an input area in our form like this:

<input type="text" id="my-zipcode" name="my-zipcode" maxlength="5" />

Just like that, we've told the browser to only allow up to five characters of input, but there's no limitation on what characters they can input. They could enter "11221" or "eval(". If we're saving to the database, there's no way we want to give the user unrestricted write access.

This is where validation plays a role. When processing the form, we'll write code to check each field for its proper data type. If it's not of the proper data type, we'll discard it. For instance, to check "my-zipcode" field, we might do something like this:

$safe_zipcode = intval( $_POST['my-zipcode'] );
if ( ! $safe_zipcode )
  $safe_zipcode = '';

if ( strlen( $safe_zipcode ) > 5 )
  $safe_zipcode = substr( $safe_zipcode, 0, 5 );

update_post_meta( $post->ID, 'my_zipcode', $safe_zipcode );

Since the `maxlength` attribute is only enforced by the browser, we still need to validate the length of the input on the server. If we don't, an attacker could manually submit a form with a longer value.

The intval() function casts user input as an integer, and defaults to zero if the input was a non-numeric value. We then check to see if the value ended up as zero. If it did, we'll save an empty value to the database. Otherwise, we'll save the properly validated zipcode.

This style of validation most closely follows WordPress' whitelist philosophy: only allow the user to input what you're expecting. Luckily, there's a number of handy helper functions you can use for most every data type.

Sanitizing: Cleaning User Input

Sanitization is a bit more liberal of an approach to accepting user data. We can fall back to using these methods when there's a range of acceptable input.

For instance, if we had a form field like this:

<input type="text" id="title" name="title" />

We could sanitize the data with the sanitize_text_field() function:

$title = sanitize_text_field( $_POST['title'] );
update_post_meta( $post->ID, 'title', $title );

Behinds the scenes, the function does the following:

  • Checks for invalid UTF-8 (uses wp_check_invalid_utf8())
  • Converts single < characters to entity
  • Strips all tags
  • Remove line breaks, tabs and extra white space
  • Strip octets

The sanitize_*() class of helper functions are super nice for us, as they ensure we're ending up with safe data and require minimal effort on our part:

Escaping: Securing Output

For security on the other end of the spectrum, we have escaping. To escape is to take the data you may already have and help secure it prior to rendering it for the end user. WordPress thankfully has a few helper functions we can use for most of what we'll commonly need to do:

esc_html() we should use anytime our HTML element encloses a section of data we're outputting.

<h4><?php echo esc_html( $title ); ?></h4>

esc_url() should be used on all URLs, including those in the 'src' and 'href' attributes of an HTML element.

<img src="<?php echo esc_url( $great_user_picture_url ); ?>" />

esc_js() is intended for inline Javascript.

<a href="#" onclick="<?php echo esc_js( $custom_js ); ?>">Click me</a>

esc_attr() can be used on everything else that's printed into an HTML element's attribute.

<ul class="<?php echo esc_attr( $stored_class ); ?>">

It's important to note that most WordPress functions properly prepare the data for output, and you don't need to escape again.

<h4><?php the_title(); ?></h4>

Conclusion

To recap: Follow the whitelist philosophy with data validation, and only allow the user to input data of your expected type. If it's not the proper type, discard it. When you have a range of data that can be entered, make sure you sanitize it. Escape data as much as possible on output to avoid XSS and malformed HTML.

Take a look through /wp-includes/formatting.php to see all of the sanitization and escaping functions WordPress has to offer.

An earlier version of this article appeared on the WordPress.com VIP Publisher Blog. Republished here with permission.