Conversion of Slugs to CSS Classnames
Back to my Page
WordPress is used in many languages and therefore theme-authors should have the possibility to use meaningfull CSS-Classnames in their native language. Currently WordPress does not support such a feature.
Unicode and CSS Classnames
CSS Classnames can contain Unicode. Every character ranging from a-z, 0-9, -, _ and nonascii ([^\0-\177]) as well as escaped sequences is allowed. This is for the CSS-Part.
Having Escaped Chracter-Sequences (starting with \) do not make sense in the class-attribute of an HTML-element because they need to be terminated with a whitespace that will break the name then, creating a list of names instead a single class-name.
So it seems practically to use Unicode (encoded in UTF-8) in both, the HTML-document as well as the css-stylesheet. To have this working, there must be use of UTF-8 in the backend (slugs in a posts data), in the frontend (the html output) and the css files.
WordPress Implementation Details
WordPress Slugs are IRI-Encoded strings. That is an urlencoded UTF-8 string simply spoken. At least if we assume that WordPress is encoding the output in UTF-8. It might make sense to have a check for that first and if it is not to have a fallback.
The code, that converts a String into a Slug (IRI-fying, encoding) in WordPress has currently not been located by me. It can make sense to check this is done correctly because it is the base of what will be decoded later on.
- [++] List of functions used in the encode
Wordpress Data mostly consists of UTF-8 if not us-ascii at least for the slugs part. So this must be save.
so if there are iri-encoded utf-8 slugs a themer only needs to ensure having utf-8 html and css files to benefit from nice css classnames.
Create Function for decoding an IRI-Path into UTF-8 urldecode();
Create a function to filter out single Byte UTF-8 Chars that are not matching CSS Classnames. This function must work without PCRE because UTF-8 Support for PCRE is since PHP 4.4.0 and 5.1.0 only. Additional PCRE does not seem to be fitting for Filtering in this case anyway. formatting.php/split_utf8();
Update Function for creating CSS Classnames to reflect UTF-8 classnames. formatting.php/clean_css_classnames();
Locate the code that converts a title into a slug (converting users input into an IRI encoded path) formatting.php/remove_accents(); is used for that as well as utf8_uri_encode(); and sanitize_title_with_dashes();.
Rate the IRI conversion code (encoding)
Does it check for input encoding? - Yes: UTF-8 and Latin1 as fallback; check for utf8, fallback for latin1 available (instability detected but ok)
Does it check for output encoding? - Output will be ascii 7bit encoded, some utf8 chars are replaced (remove_accents(), replace of spaces with '-'). according to reports, other chars are urlencoded. this is done in utf8_uri_encode(); and it will be tried to make all lowercase with the mb_strtolower(); function (if aplicable)
check current codebase for the program flow in creating a slug out of a title works. at least it creates some sort of slug that is working. the mixture with latin1 "compability mode" is somewhat misleading but beyond western languages, all should work properly. that enables at least for those theme authors propper semantics in classnames.
Rate the IRI conversion code (decoding) can be done with urldecode(); which is pretty safe. have not located the part in code, which acutally does it and trac has got many many tickets related to errors coming out of that area. I need to take a look first if the code really is stinky or not.
- Does it check for input encoding? pending...
- Does it check for output encoding? pending...
The following Tickets have been created (Bug Reports etc. or are related to the development of this feature:
- Ticket #8446: post_class() outputs invalid css class
Ticket #3727: WP->parse_request() won't replace $pathinfo when $req_uri contains any %## encoding character.
Ticket #9480: New Feature: Punycode Support in URLs (slugs)
Ticket #9492: IRI Encoding of slugs is broken