How to clean slugs with your own rules in PHP in WordPress

Contents

Introduction — why clean slugs and where WordPress acts

Slugs (the post_name field in wp_posts) are the URL-friendly strings that identify posts, pages and custom post type entries. Clean, predictable slugs are important for SEO, readability, and avoiding broken links. WordPress already provides a strong slug sanitization pipeline (remove_accents, sanitize_title_with_dashes, fallback transliteration) but many projects require custom business rules: remove stop words, enforce brand tokens, transliterate specific characters, trim lengths, preserve certain Unicode, or implement project-specific replacement maps.

What this tutorial covers

  • How WordPress sanitizes slugs by default (overview).
  • Where to hook to implement your own slug rules (filters to use).
  • Practical, ready-to-copy PHP examples for progressively more advanced rules: simple replacements, stopword stripping, transliteration, length limiting, character whitelisting, and ensuring uniqueness at save.
  • Notes on internationalization, custom post types, programmatic inserts, and edge cases.

WordPress slug pipeline (quick overview)

When a post is saved, WordPress uses sanitize_title() (and its internal helpers) to create a slug. Plugins and themes can change the result by hooking filters such as sanitize_title. Additionally, when data is prepared for insertion, the wp_insert_post_data filter lets you change the final post_name. For programmatic inserts you can also call wp_unique_post_slug() to make sure your slug is unique.

Which filters to use — recommended places

  • sanitize_title — runs early. Good for transforming the textual input (mapping, transliteration, removing words). Signature: function(title, raw_title, context) — hook with 3 args.
  • wp_insert_post_data — runs when data is inserted/updated. Good place to override post_name before DB write and ensure uniqueness via wp_unique_post_slug().
  • client-side / admin — you can also manipulate the slug before the user hits Save using admin JS, but this tutorial focuses on server-side PHP rules (reliable for programmatic saves too).

General rules to keep in mind (best practices)

  1. Sanitize inputs early (use remove_accents() or transliteration before pattern replacements).
  2. Keep slugs short (recommended limit 200 characters or lower depending on your URL structure).
  3. Decide whether to allow UTF-8 characters or restrict to ASCII — be consistent across the site.
  4. Always make slugs unique (WP will append -2, -3 etc if needed). If you want custom uniqueness rules, use wp_unique_post_slug().
  5. Test with edge cases: emojis, combined diacritics, multibyte characters, slash or backslash characters, long strings, and titles starting/ending with punctuation.

Simple example: basic custom slug cleaning

This example demonstrates a simple filter: transliterate non-ASCII with remove_accents, remove a small list of stopwords and punctuation, collapse whitespace to hyphens, and limit length. It hooks sanitize_title with 3 arguments.

lt?php
add_filter(sanitize_title, my_simple_slug_cleaner, 10, 3)

function my_simple_slug_cleaner(title, raw_title, context) {
    // Use the raw title if available — its the unsanitized user input
    text = raw_title ? raw_title : title

    // 1) Transliterate to ASCII where possible
    if (function_exists(transliterator_transliterate)) {
        // Intl transliterator if available (better quality)
        text = transliterator_transliterate(Any-Latin Latin-ASCII [u0080-u7fff] remove, text)
    } else {
        // Fallback to WP helper (removes accents)
        text = remove_accents(text)
    }

    // 2) Lowercase
    text = mb_strtolower(text, UTF-8)

    // 3) Remove a small set of stopwords (example list)
    stopwords = array(the,a,an,and,or,but,of,for,with,on,in,at)
    text = preg_replace(/b( . implode(, array_map(preg_quote, stopwords)) . )b/i,  , text)

    // 4) Remove any character that is not a letter, number, space, or hyphen
    text = preg_replace(/[^a-z0-9s-] /i,  , text)

    // 5) Convert whitespace to single hyphen
    text = preg_replace(/s /, -, trim(text))

    // 6) Collapse multiple hyphens
    text = preg_replace(/- /, -, text)

    // 7) Trim hyphens
    text = trim(text, -)

    // 8) Length limit (example: 120 characters)
    max = 120
    if (mb_strlen(text, UTF-8) > max) {
        text = mb_substr(text, 0, max, UTF-8)
        text = rtrim(text, -)
    }

    return text
}
?gt

Notes about the example

  • We use transliterator_transliterate when available (Intl extension) falling back to remove_accents. That gives a best-effort ASCII slug. If you want to keep Unicode slugs, skip transliteration and change the regex accordingly.
  • Stopwords are removed by whole-word matches (b word boundaries). Adjust the list to your language and site needs.
  • The sanitize_title filter returns the final slug other code later in WP may still adjust uniqueness.

More advanced: replacement maps, allowlist characters and custom rules class

Some projects require replacing brand tokens or expanding character combinations (e.g., C# -> csharp, C -> cpp, Node.js -> nodejs), or preserving plus signs or underscores. Below is a flexible OOP-style example you can drop into a mu-plugin or plugin file.

lt?php
class WP_Custom_Slug_Cleaner {
    protected replacements = array()
    protected stopwords = array()

    public function __construct() {
        // Example replacements you can change
        this->replacements = array(
            c#     =gt csharp,
            c      =gt cpp,
            node.js=gt nodejs,
            amp  =gt and,   // HTML entity example
            @      =gt at,
                   =gt plus,  // if you want plus signs to be verbalized
        )

        this->stopwords = array(the,a,an,and,or,of,for)

        add_filter(sanitize_title, array(this, filter_sanitize_title), 9, 3)
        add_filter(wp_insert_post_data, array(this, filter_wp_insert_post_data), 99, 2)
    }

    public function filter_sanitize_title(title, raw_title, context) {
        text = raw_title ? raw_title : title

        // Run replacements first (case-insensitive)
        text = this->apply_replacements(text)

        // Transliterate
        if (function_exists(transliterator_transliterate)) {
            text = transliterator_transliterate(Any-Latin Latin-ASCII [u0080-u7fff] remove, text)
        } else {
            text = remove_accents(text)
        }

        // Lowercase
        text = mb_strtolower(text, UTF-8)

        // Remove stopwords
        if (!empty(this->stopwords)) {
            text = preg_replace(/b( . implode(, array_map(preg_quote, this->stopwords)) . )b/i,  , text)
        }

        // Allow underscore as well (example). Adjust to allow Unicode if desired.
        text = preg_replace(/[^a-z0-9s-_] /i,  , text)

        // Replace whitespace with hyphens
        text = preg_replace(/s /, -, trim(text))

        // Collapse hyphens and underscores if you prefer
        text = preg_replace(/- /, -, text)
        text = trim(text, -_)

        // Optionally shorten to 100 chars
        text = mb_substr(text, 0, 100, UTF-8)

        return text
    }

    protected function apply_replacements(text) {
        // Case-insensitive replace preserving whole words where appropriate
        foreach (this->replacements as from =gt to) {
            pattern = / . preg_quote(from, /) . /i
            text = preg_replace(pattern, to, text)
        }
        return text
    }

    // Ensure uniqueness and apply final slug on insert
    public function filter_wp_insert_post_data(data, postarr) {
        // Only process posts/pages or your CPT
        if (empty(data[post_type])) {
            return data
        }

        // Determine the desired base slug: use supplied post_name or generate
        raw_slug = !empty(postarr[post_name]) ? postarr[post_name] : data[post_title]

        // Apply the same pipeline as sanitize_title (call our function)
        base_slug = this->filter_sanitize_title(, raw_slug, )

        // Make it unique for this post
        post_id = isset(postarr[ID]) ? (int) postarr[ID] : 0
        unique = wp_unique_post_slug(base_slug, post_id, data[post_status], data[post_type], isset(data[post_parent]) ? data[post_parent] : 0)

        data[post_name] = unique
        return data
    }
}

new WP_Custom_Slug_Cleaner()
?gt

Why call wp_unique_post_slug on insert?

WordPress will typically make slugs unique when creating posts, but if you programmatically set post_name or want to enforce a slug before save (for example to generate a canonical URL returned via REST APIs), explicitly calling wp_unique_post_slug ensures the value is unique according to WPs internals and mirrors the core behavior.

Handling multilingual sites and Unicode slugs

Decide early whether you want ASCII slugs or to preserve characters from other scripts. Both are valid: many modern sites use UTF-8 slugs (for native-language readability), but ASCII favors universal compatibility and predictable search-engine matching.

  • If you want ASCII-only slugs: transliterate with Intl transliterator or iconv (or rely on remove_accents), then strip non-ASCII characters.
  • If you want to preserve Unicode: do not transliterate — instead use a regex that allows Unicode categories. Example: use preg_replace(/[^p{L}p{N}s-]/u, , text) to allow letters and numbers from any language.

Example: Unicode-friendly cleaner

This example keeps letters and numbers from any language and converts spaces to hyphens but still applies replacement map and abbreviation expansions.

lt?php
add_filter(sanitize_title, unicode_slug_cleaner, 10, 3)

function unicode_slug_cleaner(title, raw_title, context) {
    text = raw_title ? raw_title : title

    // Apply some replacements first
    map = array(c# =gt csharp, c   =gt cpp)
    foreach (map as k =gt v) {
        text = preg_replace(/ . preg_quote(k, /) . /iu, v, text)
    }

    // Normalize whitespace
    text = trim(text)

    // Lowercase in UTF-8
    text = mb_strtolower(text, UTF-8)

    // Allow Unicode letters and numbers, spaces, hyphens
    text = preg_replace(/[^p{L}p{N}s-] /u,  , text)

    // Convert whitespace to hyphens
    text = preg_replace(/s /u, -, text)

    // Collapse hyphens
    text = preg_replace(/- /, -, text)

    return trim(text, -)
}
?gt

Programmatic generation and ensuring uniqueness (example)

If you create posts programmatically (wp_insert_post or wp_insert_post with REST), you may want to generate a slug and check it before inserting. Example below shows generating a slug and ensuring uniqueness.

lt?php
function generate_and_insert_post(title, content = , post_type = post) {
    // Create a base slug using sanitize_title()
    base = sanitize_title(title)

    // Ensure uniqueness for a new post
    unique_slug = wp_unique_post_slug(base, 0, publish, post_type, 0)

    new_post = array(
        post_title  =gt title,
        post_content=gt content,
        post_status =gt publish,
        post_type   =gt post_type,
        post_name   =gt unique_slug,
    )

    id = wp_insert_post(new_post)
    return id
}
?gt

Edge cases and pitfalls

  • Infinite recursion: dont call sanitize_title inside your sanitize_title filter in a way that causes recursion. If you must call a core helper, operate on the raw title and return a string without re-applying the same filter chain recursively.
  • Multibyte trimming: use mb_substr and mb_strlen for UTF-8-safe truncation.
  • Reserved slugs: if you have routes / endpoints that conflict with slugs (e.g., blog, shop, api), you can maintain a deny-list and replace or prepend a namespace.
  • Hierarchical slugs for pages: WordPress concatenates slugs for hierarchical post types your policy should consider parent slugs.
  • URLs vs. post_name: post_name is just one part of the permalink. When building full URLs, other rewrite rules and pagination can change final structure.

Testing strategy

  1. Unit test your slug function with edge cases: empty strings, long strings, emoji, multibyte characters, punctuation-heavy titles, repeated words, and titles that would result in the same slug after processing.
  2. Create posts via admin, REST, and wp_insert_post to ensure behavior is identical across entry points.
  3. Check uniqueness collisions: insert posts with identical titles and verify slug increments (or your custom unique logic works).
  4. Test for XSS / injection by passing script-like input to the title and verifying slug output uses only safe characters.

Practical checklist before deploying slug rules to production

  • Audit existing URLs — changes to slug rules can break existing links. If you change rules on a live site, ensure you implement 301 redirects for changed slugs.
  • Decide how to handle legacy slugs: you can store previous slugs in postmeta and redirect from them.
  • Ensure you don’t break feeds, sitemaps or canonical URLs.
  • Keep language consistency — if a site uses multiple languages, test each language’s rules.

Small utilities and patterns you might reuse

  • Replacement maps for technical tokens (C#, C , Node.js, .NET)
  • Stopword lists per language (store in an option or file)
  • Allowlist of characters (regex driven) so you can easily switch between ASCII-only and Unicode-friendly rules
  • Hook into wp_insert_post_data for final enforcement and to ensure the post_name saved is what you expect

When to avoid customizing slugs

If your site must preserve every original user-provided slug (e.g., user-generated content where the user expects full control), avoid overriding slugs automatically. Instead, sanitize minimally and provide admin tools to clean slugs on demand or to suggest slugs without forcing them.

Further reading

Summary

To implement your own slug rules in PHP for WordPress, the recommended approach is:

  1. Hook the sanitize_title filter to apply textual transforms and character rules.
  2. If you need to enforce a final slug before DB write (and ensure uniqueness), also hook wp_insert_post_data and call wp_unique_post_slug().
  3. Choose whether to transliterate to ASCII or preserve Unicode, and write regexes carefully (use Unicode regex flags when appropriate).
  4. Test thoroughly, and plan redirects for any existing slugs you change.

Example snippets in this article are drop-in ready — adapt the replacement maps, stopwords and allowed characters to match your projects language and policy.



Acepto donaciones de BAT's mediante el navegador Brave 🙂



Leave a Reply

Your email address will not be published. Required fields are marked *