Codex

Interested in functions, hooks, classes, or methods? Check out the new WordPress Code Reference!

zh-tw:資料驗證

來自許多來源(使用者、第三方網站以及你自己的資料庫!...)的不可信資料都需要在輸入以及輸出時被驗證。

輸出淨化

資料淨化的方法取決於資料的類型以及其所使用的內容。以下是某一些在 WordPress 裡的日常任務以及它們應該要如何被淨化。

小秘訣:最好盡可能到最後才做輸出驗證,在資料即將輸出時再做輸出驗證是最為理想的。這樣一來你將能隨時確保你的資料有被適當地驗證/跳脫過並且你也不需要記得該變數在之前是否已經被驗證過。

整數

intval( $int )(int) $int
如果該變數應該要是整數,那就把它轉型為整數。
absint( $int )
確保結果為非負數。

HTML/XML

Note that many types of XML documents (as opposed to HTML documents) understand only a few named character references: apos, amp, gt, lt, quot. When outputting text to such an XML document, be sure to filter any text containing illegal named entities through WordPress's ent2ncr( $text ) function.

HTML/XML Fragments

wp_kses( (string) $fragment, (array) $allowed_html, (array) $protocols = null )
KSES Strips Evil Scripts. All untrusted HTML (post text, comment text, etc.) should be run through wp_kses().
To avoid having to pass an array of allowed HTML tags, you can use wp_kses_post( (string) $fragment ) for tags that are allowed in posts/pages or wp_kses_data( (string) $fragment ) for the small list of tags allowed in comments.
Note that the kses system can be resource-intensive, and should therefore not be run as an output sanitization filter directly, but as a filter to data after it has been input and processed, before it is saved in the database. WordPress runs kses on the pre_comment_content filter, for example, to filter the HTML before saving the comment.
wp_rel_nofollow( (string) $html )
Adds a "rel='nofollow'" attribute to any <a> link.
wp_kses_allowed_html( (string) $context )
Provides an array of allowed HTML tags for a given context. Allowed values are post | strip | data | entities or the name of a field filter such : as pre_user_description.

Text Nodes

esc_html( $text ) (since 2.8)
Encodes < > & " ' (less than, greater than, ampersand, double quote, single quote). Identical to esc_attr, except it applies the esc_html filter to the output.
esc_html__ (since 2.8)
Translates and encodes
esc_html_e (since 2.8)
Translates, encodes, and echoes
esc_textarea (since 3.1)
Encodes text for use inside a textarea element.
sanitize_text_field (since 2.9.0)
Sanitize a string from user input or from the db.

屬性節點

esc_attr( $text ) (since 2.8)
< > & " ' (小於、大於、& 符號、雙引號、單引號)編碼。
除了它會將 attribute_escape 過濾標籤套用到輸出以外 ,功能同 esc_html
esc_attr__()
轉譯並編碼
esc_attr_e()
轉譯、編碼並回應

JavaScript

esc_js( $text ) (since 2.8)

URLs

esc_url( $url, (array) $protocols = null ) (since 2.8)
Always use esc_url when sanitizing URLs (in text nodes, attribute nodes or anywhere else). Rejects URLs that do not have one of the provided whitelisted protocols (defaulting to http, https, ftp, ftps, mailto, news, irc, gopher, nntp, feed, and telnet), eliminates invalid characters, and removes dangerous characters. Replaces clean_url() which was deprecated in 3.0.
This function encodes characters as HTML entities: use it when generating an (X)HTML or XML document. Encodes ampersands (&) and single quotes (') as numeric entity references (&#038, &#039).
esc_url_raw( $url, (array) $protocols = null ) (since 2.8)
For inserting a URL in the database. This function does not encode characters as HTML entities: use it when storing a URL or in other cases where you need the non-encoded URL. This functionality can be replicated in the old clean_url function by setting $context to db.
urlencode( $scalar )
Encodes for use in URL (as a query parameter, for example)
urlencode_deep( $array )
urlencodes all array elements.

資料庫

$wpdb->insert( $table, (array) $data )
$data 變數應該要是尚未經過跳脫處理的(該函數會為你替它們跳脫特殊字元)。
陣列索引鍵為欄位,陣列索引值為值。
$wpdb->update( $table, (array) $data, (array) $where )
$data 變數應該要是尚未經過跳脫處理的。陣列索引鍵為欄位,陣列索引值為值。
$where 變數應該要是尚未經過跳脫處理的。多個 WHERE 條件是以 AND 連接。
$wpdb->update(
  'my_table',
  array( 'status' => $untrusted_status, 'title' => $untrusted_title ),
  array( 'id' => 123 )
);
$wpdb->prepare( $format, (scalar) $value1, (scalar) $value2, ... )
$format 變數是一個格式類似於 sprintf() 的字串,它只能解析 %s%d 以及 %f,任何不是這些指令的變數都需要以單引號包起來。
$wpdb->get_var( $wpdb->prepare(
  "SELECT something FROM table WHERE foo = %s and status = %d",
  $name, // an unescaped string (function will do the sanitization for you)
  $status // an untrusted integer (function will do the sanitization for you)
) );
esc_sql( $sql )
Escapes a single string or string array for use in a SQL query. Glorified addslashes(). $wpdb->prepare is generally preferred because it corrects a few common formatting errors.
$wpdb->escape( $text )
Deprecated since 3.6. Use esc_sql() or $wpdb->prepare() instead.
$wpdb->escape_by_ref( &$text )
No return value. Since the parameter is passed by reference, the text is directly modified, so no need to assign any returned value.
$wpdb->esc_like( $text )
Sanitizes $text for use in a LIKE expression of a SQL query. Will still need to be SQL escaped (with one of the above functions).
like_escape( $string )
Deprecated since 4.0. Use $wpdb->esc_like() instead.

Filesystem

validate_file( (string) $filename, (array) $allowed_files = "" )
Used to prevent directory traversal attacks, or to test a filename against a whitelist. Returns 0 if $filename represents a valid relative path. After validating, you must treat $filename as a relative path (i.e. you must prepend it with an absolute path), since something like /etc/hosts will validate with this function. Returns an integer greater than zero if the given path contains .., ./, or :, or is not in the $allowed_files whitelist. Be careful making boolean interpretations of the result, since false (0) indicates the filename has passed validation, whereas true (> 0) indicates failure.

HTTP Headers

Header splitting attacks are annoying since they are dependent on the HTTP client. WordPress has little need to include user generated content in HTTP headers, but when it does, WordPress typically uses whitelisting for most of its HTTP headers.

WordPress does use user generated content in HTTP Location headers, and provides sanitization for those.

wp_redirect($location, $status = 302)
A safe way to redirect to any URL. Ensures the resulting HTTP Location header is legitimate.
wp_safe_redirect($location, $status = 302)
Even safer. Only allows redirects to whitelisted domains.

Input Validation

Many of the functions above in #Output_Sanitization are useful for input validation. In addition, WordPress uses the following functions.

Slugs

sanitize_title( $title )
Used in post slugs, for example
sanitize_user( $username, $strict = false )
Use $strict when creating a new user (though you should use the API for that).

HTML

balanceTags( $html ) or force_balance_tags( $html )
Tries to make sure HTML tags are balanced so that valid XML is output.
tag_escape( $html_tag_name )
Sanitizes an HTML tag name (does not escape anything, despite the name of the function).
sanitize_html_class( $class, $fallback )
Sanitizes a html classname to ensure it only contains valid characters. Strips the string down to A-Z,a-z,0-9,'-' if this results in an empty string then it will return the alternative value supplied.

Email

is_email( $email_address )
returns boolean false if invalid, or $email_address if valid

Arrays

array_map( 'absint', $array )
Ensures all elements are nonnegative integers. Replace callback 'absint' with whatever is appropriate for your data. array_map() is a core PHP function that runs array elements through an arbitrary callback function, in this example, absint().

Other

Some other functions that may be useful to sanitize data input:

Validation Philosophies

There are several different philosophies about how validation should be done. Each is appropriate for different scenarios.

Whitelist

Accept data only from a finite list of known and trusted values.

When comparing untrusted data against the whitelist, it's important to make sure that strict type checking is used. Otherwise an attacker could craft input in a way that will pass the whitelist but still have a malicious effect.

Comparison Operator

$untrusted_input = '1 malicious string';  // will evaluate to integer 1 during loose comparisons

if ( 1 === $untrusted_input ) {  // == would have evaluated to true, but === evaluates to false
	echo '<p>Valid data';
} else {
	wp_die( 'Invalid data' );
}

in_array()

$untrusted_input = '1 malicious string';  // will evaluate to integer 1 during loose comparisons
$safe_values     = array( 1, 5, 7 );

if ( in_array( $untrusted_input, $safe_values, true ) ) {  // `true` enables strict type checking
	echo '<p>Valid data';
} else {
	wp_die( 'Invalid data' );
}

switch()

$untrusted_input = '1 malicious string';  // will evaluate to integer 1 during loose comparisons

switch ( true ) {
	case 1 === $untrusted_input:  // do your own strict comparison instead of relying on switch()'s loose comparison
		echo '<p>Valid data';
		break;

	default:
		wp_die( 'Invalid data' );
}

Blacklist

Reject data from finite list of known untrusted values. This is very rarely a good idea.

Format Detection

Test to see if the data is of the correct format. Only accept it if it is.

if ( ! ctype_alnum( $data ) ) {
  wp_die( "Invalid format" );
}

if ( preg_match( "/[^0-9.-]/", $data ) ) {
  wp_die( "Invalid format" );
}

Format Correction

Accept most any data, but remove or alter the dangerous pieces.

$trusted_integer = (int) $untrusted_integer;
$trusted_alpha = preg_replace( '/[^a-z]/i', "", $untrusted_alpha );
$trusted_slug = sanitize_title( $untrusted_slug );

Changelog

External Resources