Codex

Interested in functions, hooks, classes, or methods? Check out the new WordPress Code Reference!

Difference between revisions of "Data Validation"

(Migrated to DevHub)
 
(29 intermediate revisions by 16 users not shown)
Line 1: Line 1:
  +
<!--
 
{{Languages|
 
{{Languages|
 
{{en|Data Validation}}
 
{{en|Data Validation}}
 
{{ja|Data Validation}}
 
{{ja|Data Validation}}
 
{{ru|Валидация данных}}
 
{{ru|Валидация данных}}
  +
{{zh-tw|資料驗證}}
 
}}
 
}}
   
 
Untrusted data comes from many sources (users, third party sites, your own database!, ...) and all of it needs to be validated both on input and output.
 
Untrusted data comes from many sources (users, third party sites, your own database!, ...) and all of it needs to be validated both on input and output.
   
== Output Sanitation ==
+
== Output Sanitization ==
The method of data sanitation depends on the type of data and the context in which it is used. Below are some common tasks in WordPress and how they should be sanitized.
+
The method of data sanitization depends on the type of data and the context in which it is used. Below are some common tasks in WordPress and how they should be sanitized.
   
 
Tip: It's best to do the output validation as late as possible, ideally as it's being outputted, as opposed to further up in your script. This way you can always be sure that your data is properly validated/escaped and you don't need to remember if the variable has been previously validated.
 
Tip: It's best to do the output validation as late as possible, ideally as it's being outputted, as opposed to further up in your script. This way you can always be sure that your data is properly validated/escaped and you don't need to remember if the variable has been previously validated.
Line 26: Line 28:
 
:
 
:
 
: To avoid having to pass an array of allowed HTML tags, you can use <code>[[Function Reference/wp_kses_post|wp_kses_post]]( (string) $fragment )</code> for tags that are allowed in posts/pages or <code>[[Function Reference/wp_kses_data|wp_kses_data]]( (string) $fragment )</code> for the small list of tags allowed in comments.
 
: To avoid having to pass an array of allowed HTML tags, you can use <code>[[Function Reference/wp_kses_post|wp_kses_post]]( (string) $fragment )</code> for tags that are allowed in posts/pages or <code>[[Function Reference/wp_kses_data|wp_kses_data]]( (string) $fragment )</code> for the small list of tags allowed in comments.
  +
:
  +
: Note that the kses system can be resource-intensive, and should therefore not be run as an output sanitization filter directly, but as a filter to data after it has been input and processed, before it is saved in the database. WordPress runs kses on the pre_comment_content filter, for example, to filter the HTML before saving the comment.
 
; <code>[[Function Reference/wp_rel_nofollow|wp_rel_nofollow]]( (string) $html )</code>
 
; <code>[[Function Reference/wp_rel_nofollow|wp_rel_nofollow]]( (string) $html )</code>
 
: Adds a "rel='nofollow'" attribute to any <nowiki><a></nowiki> link.
 
: Adds a "rel='nofollow'" attribute to any <nowiki><a></nowiki> link.
Line 34: Line 38:
 
==== Text Nodes ====
 
==== Text Nodes ====
 
; <code>[[Function Reference/esc_html|esc_html]]( $text )</code> (since 2.8)
 
; <code>[[Function Reference/esc_html|esc_html]]( $text )</code> (since 2.8)
: Encodes <tt>< > & " '</tt> (less than, greater than, ampersand, double quote, single quote). Very similar to <code>esc_attr</code>.
+
: Encodes <tt>&lt; &gt; &amp; &quot; &#39;</tt> (less than, greater than, ampersand, double quote, single quote). Identical to <code>esc_attr</code>, except it applies the <code>esc_html</code> filter to the output.
 
; <code>[[Function Reference/esc_html_2|esc_html__]]</code> (since 2.8)
 
; <code>[[Function Reference/esc_html_2|esc_html__]]</code> (since 2.8)
 
: Translates and encodes
 
: Translates and encodes
Line 46: Line 50:
 
==== Attribute Nodes ====
 
==== Attribute Nodes ====
 
; <code>[[Function Reference/esc_attr|esc_attr]]( $text )</code> (since 2.8)
 
; <code>[[Function Reference/esc_attr|esc_attr]]( $text )</code> (since 2.8)
  +
: Encodes <tt>< > & " '</tt> (less than, greater than, ampersand, double quote, single quote). Identical to <code>esc_html</code>, except it applies the <code>attribute_escape</code> filter to the output.
 
; <code>[[Function Reference/esc_attr_2|esc_attr__]]()</code>
 
; <code>[[Function Reference/esc_attr_2|esc_attr__]]()</code>
 
: Translates and encodes
 
: Translates and encodes
Line 56: Line 61:
 
=== URLs ===
 
=== URLs ===
 
; <code>[[Function Reference/esc_url|esc_url]]( $url, (array) $protocols = null )</code> (since 2.8)
 
; <code>[[Function Reference/esc_url|esc_url]]( $url, (array) $protocols = null )</code> (since 2.8)
: Always use <code>esc_url</code> when sanitizing URLs (in text nodes, attribute nodes or anywhere else). Rejects URLs that do not have one of the provided whitelisted protocols (defaulting to <tt>http</tt>, <tt>https</tt>, <tt>ftp</tt>, <tt>ftps</tt>, <tt>mailto</tt>, <tt>news</tt>, <tt>irc</tt>, <tt>gopher</tt>, <tt>nntp</tt>, <tt>feed</tt>, and <tt>telnet</tt>), eliminates invalid characters, and removes dangerous characters. Replaces <code>clean_url()</code> which was deprecated in 3.0.
+
: Always use <code>esc_url</code> when sanitizing URLs (in text nodes, attribute nodes or anywhere else). Rejects URLs that do not have one of the provided protocols (defaulting to <tt>http</tt>, <tt>https</tt>, <tt>ftp</tt>, <tt>ftps</tt>, <tt>mailto</tt>, <tt>news</tt>, <tt>irc</tt>, <tt>gopher</tt>, <tt>nntp</tt>, <tt>feed</tt>, and <tt>telnet</tt>), eliminates invalid characters, and removes dangerous characters. Replaces <code>clean_url()</code> which was deprecated in 3.0.
 
: This function encodes characters as HTML entities: use it when generating an (X)HTML or XML document. Encodes ampersands (<tt>&</tt>) and single quotes (<tt>'</tt>) as numeric entity references (<tt>&#038, &#039</tt>).
 
: This function encodes characters as HTML entities: use it when generating an (X)HTML or XML document. Encodes ampersands (<tt>&</tt>) and single quotes (<tt>'</tt>) as numeric entity references (<tt>&#038, &#039</tt>).
 
; <code>[[Function Reference/esc_url_raw|esc_url_raw]]( $url, (array) $protocols = null )</code> (since 2.8)
 
; <code>[[Function Reference/esc_url_raw|esc_url_raw]]( $url, (array) $protocols = null )</code> (since 2.8)
: For inserting an URL in the database. This function does not encode characters as HTML entities: use it when storing a URL or in other cases where you need the non-encoded URL. This functionality can be replicated in the old <code>clean_url</code> function by setting <code>$context</code> to <code>db</code>.
+
: For inserting a URL in the database. This function does not encode characters as HTML entities: use it when storing a URL or in other cases where you need the non-encoded URL. This functionality can be replicated in the old <code>clean_url</code> function by setting <code>$context</code> to <code>db</code>.
 
; <code>[[Function Reference/urlencode|urlencode]]( $scalar )</code>
 
; <code>[[Function Reference/urlencode|urlencode]]( $scalar )</code>
 
: Encodes for use in URL (as a query parameter, for example)
 
: Encodes for use in URL (as a query parameter, for example)
Line 76: Line 81:
 
);
 
);
 
; <code>[[Class Reference/wpdb#Protect Queries Against SQL Injection Attacks|$wpdb->prepare]]( $format, (scalar) $value1, (scalar) $value2, ... )</code>
 
; <code>[[Class Reference/wpdb#Protect Queries Against SQL Injection Attacks|$wpdb->prepare]]( $format, (scalar) $value1, (scalar) $value2, ... )</code>
: <code>$format</code> is a [http://php.net/sprintf sprintf()] like format string. It only understands <code>%s</code> and <code>%d</code>, neither of which needs to be enclosed in quotation marks.
+
: <code>$format</code> is a [http://php.net/sprintf sprintf()] like format string. It only understands <code>%s</code>, <code>%d</code> and <code>%f</code>, none of which need to be enclosed in quotation marks.
 
$wpdb->get_var( $wpdb->prepare(
 
$wpdb->get_var( $wpdb->prepare(
 
"SELECT something FROM table WHERE foo = %s and status = %d",
 
"SELECT something FROM table WHERE foo = %s and status = %d",
$name, // an unescaped string (function will do the sanitation for you)
+
$name, // an unescaped string (function will do the sanitization for you)
$status // an untrusted integer (function will do the sanitation for you)
+
$status // an untrusted integer (function will do the sanitization for you)
 
) );
 
) );
 
; <code>[[Function Reference/esc_sql|esc_sql]]( $sql )</code>
 
; <code>[[Function Reference/esc_sql|esc_sql]]( $sql )</code>
: Alias for <code>$wpdb->escape()</code>.
 
; <code>$wpdb->escape( $text )</code>
 
 
: Escapes a single string or string array for use in a SQL query. Glorified <code>addslashes(). $wpdb->prepare</code> is generally preferred because it corrects a few common formatting errors.
 
: Escapes a single string or string array for use in a SQL query. Glorified <code>addslashes(). $wpdb->prepare</code> is generally preferred because it corrects a few common formatting errors.
 
; <code>$wpdb->escape( $text )</code>
  +
: Deprecated since [[Version 3.6|3.6]]. Use <tt>esc_sql()</tt> or <tt>$wpdb->prepare()</tt> instead.
 
; <code>$wpdb->escape_by_ref( &$text )</code>
 
; <code>$wpdb->escape_by_ref( &$text )</code>
 
: No return value. Since the parameter is passed by reference, the text is directly modified, so no need to assign any returned value.
 
: No return value. Since the parameter is passed by reference, the text is directly modified, so no need to assign any returned value.
 
; <code>[[Class_Reference/wpdb/esc_like|$wpdb->esc_like]]( $text )</code>
 
: Sanitizes <code>$text</code> for use in a LIKE expression of a SQL query. Will still need to be SQL escaped (with one of the above functions).
 
; <code>[[Function Reference/like_escape|like_escape]]( $string )</code>
 
; <code>[[Function Reference/like_escape|like_escape]]( $string )</code>
  +
: Deprecated since [[Version 4.0|4.0]]. Use <tt>$wpdb->esc_like()</tt> instead.
: Sanitizes <code>$string</code> for use in a LIKE expression of a SQL query. Will still need to be SQL escaped (with one of the above functions).
 
   
 
=== Filesystem ===
 
=== Filesystem ===
 
; <code>[[Function Reference/validate_file|validate_file]]( (string) $filename, (array) $allowed_files = "" )</code>
 
; <code>[[Function Reference/validate_file|validate_file]]( (string) $filename, (array) $allowed_files = "" )</code>
: Used to prevent directory traversal attacks, or to test a filename against a whitelist. Returns <tt>0</tt> if <code>$filename</code> represents a valid relative path. After validating, you <em>must</em> treat <code>$filename</code> as a relative path (i.e. you must prepend it with an absolute path), since something like <tt>/etc/hosts</tt> will validate with this function. Returns an integer greater than zero if the given path contains <tt>..</tt>, <tt>./</tt>, or <tt>:</tt>, or is not in the <code>$allowed_files</code> whitelist. Be careful making boolean interpretations of the result, since <tt>false</tt> (0) indicates the filename has passed validation, whereas <tt>true</tt> (> 0) indicates failure.
+
: Used to prevent directory traversal attacks, or to test a filename against a safelist. Returns <tt>0</tt> if <code>$filename</code> represents a valid relative path. After validating, you <em>must</em> treat <code>$filename</code> as a relative path (i.e. you must prepend it with an absolute path), since something like <tt>/etc/hosts</tt> will validate with this function. Returns an integer greater than zero if the given path contains <tt>..</tt>, <tt>./</tt>, or <tt>:</tt>, or is not in the <code>$allowed_files</code> safelist. Be careful making boolean interpretations of the result, since <tt>false</tt> (0) indicates the filename has passed validation, whereas <tt>true</tt> (> 0) indicates failure.
   
 
=== HTTP Headers ===
 
=== HTTP Headers ===
Header splitting attacks are annoying since they are dependent on the HTTP client. WordPress has little need to include user generated content in HTTP headers, but when it does, WordPress typically uses [[#Whitelist|whitelisting]] for most of its HTTP headers.
+
Header splitting attacks are annoying since they are dependent on the HTTP client. WordPress has little need to include user-generated content in HTTP headers, but when it does, WordPress typically uses [[#Safelist|safelisting]] for most of its HTTP headers.
   
WordPress does use user generated content in HTTP Location headers, and provides sanitation for those.
+
WordPress does use user-generated content in HTTP Location headers and provides sanitization for those.
   
 
; <code>[[Function Reference/wp_redirect|wp_redirect]]($location, $status = 302)</code>
 
; <code>[[Function Reference/wp_redirect|wp_redirect]]($location, $status = 302)</code>
 
: A safe way to redirect to any URL. Ensures the resulting HTTP Location header is legitimate.
 
: A safe way to redirect to any URL. Ensures the resulting HTTP Location header is legitimate.
 
; <code>[[Function Reference/wp_safe_redirect|wp_safe_redirect]]($location, $status = 302)</code>
 
; <code>[[Function Reference/wp_safe_redirect|wp_safe_redirect]]($location, $status = 302)</code>
: Even safer. Only allows redirects to whitelisted domains.
+
: Even safer. Only allows redirects to safelisted domains.
   
 
== Input Validation ==
 
== Input Validation ==
   
Many of the functions above in [[#Output_Sanitation]] are useful for input validation. In addition, WordPress uses the following functions.
+
Many of the functions above in [[#Output_Sanitization]] are useful for input validation. In addition, WordPress uses the following functions.
   
 
=== Slugs ===
 
=== Slugs ===
Line 116: Line 123:
   
 
=== HTML ===
 
=== HTML ===
; <code>[[Function Reference/balanceTags|balanceTags]]( $html )</code> or <code>[[Function Reference/force_balance_tags|force_balance_tags]]( $html )</code>
+
; <code>[https://developer.wordpress.org/reference/functions/balanceTags/ balanceTags]( $html )</code> or <code>[https://developer.wordpress.org/reference/functions/force_balance_tags/ force_balance_tags]( $html )</code>
 
: Tries to make sure HTML tags are balanced so that valid XML is output.
 
: Tries to make sure HTML tags are balanced so that valid XML is output.
; <code>[[Function Reference/tag_escape|tag_escape]]( $html_tag_name )</code>
+
; <code>[https://developer.wordpress.org/reference/functions/tag_escape/ tag_escape]( $html_tag_name )</code>
 
: Sanitizes an HTML tag name (does not escape anything, despite the name of the function).
 
: Sanitizes an HTML tag name (does not escape anything, despite the name of the function).
; <code>[[Function Reference/sanitize_html_class|sanitize_html_class]]( $class, $fallback )</code>
+
; <code>[https://developer.wordpress.org/reference/functions/sanitize_html_class/ sanitize_html_class]( $class, $fallback )</code>
: Santizes a html classname to ensure it only contains valid characters. Strips the string down to A-Z,a-z,0-9,'-' if this results in an empty string then it will return the alternative value supplied.
+
: Sanitizes a html classname to ensure it only contains valid characters. Strips the string down to A-Z,a-z,0-9,'-' if this results in an empty string then it will return the alternative value supplied.
   
 
=== Email ===
 
=== Email ===
Line 132: Line 139:
   
 
=== Other ===
 
=== Other ===
Some other functions that may be useful to sanitise data input:
+
Some other functions that may be useful to sanitize data input:
   
 
* [[Function_Reference/sanitize_email|sanitize_email()]]
 
* [[Function_Reference/sanitize_email|sanitize_email()]]
Line 152: Line 159:
 
There are several different philosophies about how validation should be done. Each is appropriate for different scenarios.
 
There are several different philosophies about how validation should be done. Each is appropriate for different scenarios.
   
=== Whitelist ===
+
=== Safelist ===
 
Accept data only from a finite list of known and trusted values.
 
Accept data only from a finite list of known and trusted values.
   
  +
When comparing untrusted data against the safelist, it's important to make sure that strict type checking is used. Otherwise an attacker could craft input in a way that will pass the safelist but still have a malicious effect.
$possible_values = array( 'a', 1, 'good' );
 
if ( !in_array( $untrusted, $possible_values ) )
 
die( "Don't do that!" );
 
   
  +
==== Comparison Operator ====
// Be careful here with fancy breaks and default actions.
 
switch ( $untrusted ) {
 
case 'a' :
 
...
 
break;
 
...
 
default :
 
die( "You hoser!" );
 
}
 
   
  +
<code><pre>
=== Blacklist ===
 
  +
$untrusted_input = '1 malicious string'; // will evaluate to integer 1 during loose comparisons
  +
  +
if ( 1 === $untrusted_input ) { // == would have evaluated to true, but === evaluates to false
  +
echo '<p>Valid data';
  +
} else {
  +
wp_die( 'Invalid data' );
  +
}
  +
</pre></code>
  +
  +
==== in_array() ====
  +
  +
<code><pre>
  +
$untrusted_input = '1 malicious string'; // will evaluate to integer 1 during loose comparisons
 
$safe_values = array( 1, 5, 7 );
  +
  +
if ( in_array( $untrusted_input, $safe_values, true ) ) { // `true` enables strict type checking
  +
echo '<p>Valid data';
  +
} else {
  +
wp_die( 'Invalid data' );
  +
}
  +
</pre></code>
  +
  +
==== switch() ====
  +
  +
<code><pre>
  +
$untrusted_input = '1 malicious string'; // will evaluate to integer 1 during loose comparisons
  +
 
switch ( true ) {
  +
case 1 === $untrusted_input: // do your own strict comparison instead of relying on switch()'s loose comparison
  +
echo '<p>Valid data';
 
break;
  +
 
default:
  +
wp_die( 'Invalid data' );
  +
}
  +
</pre></code>
  +
 
=== Blocklist ===
 
Reject data from finite list of known untrusted values. This is very rarely a good idea.
 
Reject data from finite list of known untrusted values. This is very rarely a good idea.
   
 
=== Format Detection ===
 
=== Format Detection ===
 
Test to see if the data is of the correct format. Only accept it if it is.
 
Test to see if the data is of the correct format. Only accept it if it is.
if ( !ctype_alnum( $data ) )
+
if ( ! ctype_alnum( $data ) ) {
die( "Your data is teh suX0R" );
+
wp_die( "Invalid format" );
 
}
if ( preg_match( "/[^0-9.-]/", $data ) )
 
  +
die( "Float on somewhere else, jerky" );
 
 
if ( preg_match( "/[^0-9.-]/", $data ) ) {
  +
wp_die( "Invalid format" );
  +
}
   
 
=== Format Correction ===
 
=== Format Correction ===
Line 186: Line 224:
   
 
== Changelog ==
 
== Changelog ==
  +
* [[Version 3.6|3.6]]: Deprecated <tt>$wpdb->escape()</tt> in favor of <tt>[[Function_Reference/esc_sql | esc_sql()]]</tt> and <tt>[[#Database|$wpdb->prepare()]]</tt>.
 
* [[Version 3.1|3.1]]: Introduced <code>[[Function_Reference/esc_textarea|esc_textarea]]</code>. ([http://core.trac.wordpress.org/ticket/15454 #15454])
 
* [[Version 3.1|3.1]]: Introduced <code>[[Function_Reference/esc_textarea|esc_textarea]]</code>. ([http://core.trac.wordpress.org/ticket/15454 #15454])
 
* [[Version 3.0|3.0]]: Deprecated <code>[[#URLs|clean_url()]]</code> in favor of <code>esc_url()</code> and <code>esc_url_raw()</code>. ([http://core.trac.wordpress.org/ticket/12309 #12309])
 
* [[Version 3.0|3.0]]: Deprecated <code>[[#URLs|clean_url()]]</code> in favor of <code>esc_url()</code> and <code>esc_url_raw()</code>. ([http://core.trac.wordpress.org/ticket/12309 #12309])
Line 196: Line 235:
 
* [http://wp.tutsplus.com/tutorials/creative-coding/data-sanitization-and-validation-with-wordpress/ Data Sanitization and Validation With WordPress] by Stephen Harris
 
* [http://wp.tutsplus.com/tutorials/creative-coding/data-sanitization-and-validation-with-wordpress/ Data Sanitization and Validation With WordPress] by Stephen Harris
 
* [http://wordpress.tv/2011/01/29/mark-jaquith-theme-plugin-security/ Theme and Plugin Security] by Mark Jaquith
 
* [http://wordpress.tv/2011/01/29/mark-jaquith-theme-plugin-security/ Theme and Plugin Security] by Mark Jaquith
* [http://groups.google.com/group/wp-hackers/browse_thread/thread/8f1466febb168935?pli=1 wp_specialchars() vs attribute_escape() ( now esc_attr() ) and quote entity-encoding].
 
   
   
 
[[Category:Security]]
 
[[Category:Security]]
 
[[Category:WordPress Development]]
 
[[Category:WordPress Development]]
  +
-->
  +
  +
Migrated to: https://developer.wordpress.org/apis/security/data-validation/

Latest revision as of 17:05, 6 December 2022


Migrated to: https://developer.wordpress.org/apis/security/data-validation/