How to create an endpoint for relevance and synonym searches in WordPress

Contents

Overview

This tutorial shows how to build a custom WordPress REST API endpoint that performs relevance-based searches and supports synonym expansion. It covers design decisions, security, indexing, a practical PHP implementation using the WordPress REST API, how to create MySQL fulltext indexes, query expansion for synonyms and boosting, pagination and caching, and thoughts on scaling and alternatives (Elasticsearch, Algolia).

Design choices and considerations

  • Goal: Return search results ordered by relevance and support synonyms so queries match equivalent words (e.g., car => automobile).
  • Primary storage: Use native MySQL fulltext search (MATCH() AGAINST()) applied to wp_posts (title and content), which is fast for many sites. Alternative is a dedicated search engine (Elasticsearch, Algolia) for better scoring and synonyms management.
  • Query expansion: Expand query terms with synonyms and optionally apply boosts so original terms rank higher than synonyms.
  • Security: Implement permission callbacks, sanitize input, and rate-limit or cache results to avoid expensive queries from malicious calls.
  • Compatibility: Ensure code works on WP >= 4.7 (REST API present). The SQL used should be compatible with your MySQL version and storage engine (InnoDB fulltext indexes supported in MySQL 5.6 ).

Preparation: Add fulltext indexes

To compute relevance with MATCH() AGAINST(), create fulltext indexes on post_title and post_content. If you prefer using a combined index, create it on (post_title, post_content).

SQL to create a fulltext index

ALTER TABLE wp_posts
  ADD FULLTEXT ft_post_title_content (post_title, post_content)

Confirm the index exists and that your MySQL supports fulltext on your table engine. If you cannot add fulltext, fallback approaches include LIKE queries or using an external search engine.

High-level algorithm

  1. Receive HTTP GET or POST with query parameters: q (search string), page, per_page, synonyms option, boosts, etc.
  2. Sanitize and tokenise the query.
  3. For each token, expand synonyms from a synonyms dictionary or stored option.
  4. Build a SQL scoring expression using MATCH() AGAINST() on combined fields and optionally add boosting factors for exact matches or title matches.
  5. Run a WP_Query using posts_clauses filters to inject MATCH() AGAINST() and ORDER BY relevance. Retrieve required fields (ID, title, excerpt, score).
  6. Cache results (transients) for identical requests to reduce DB load.
  7. Return JSON: hits array with relevance score and matched terms, pagination metadata.

WordPress REST endpoint: register route

Below is a complete example. It registers REST route /wp-json/custom-search/v1/search, handles permissions, expands synonyms, injects a custom posts_clauses to compute relevance with MATCH() AGAINST(), supports pagination, and caches results.

Implementation (PHP)

 GET,
        callback => crs_handle_search,
        permission_callback => crs_search_permissions,
        args => array(
            q => array(
                required => true,
                validate_callback => function(param) {
                    return is_string(param)  strlen(trim(param)) > 0
                }
            ),
            page => array(default => 1, sanitize_callback => absint),
            per_page => array(default => 10, sanitize_callback => absint),
            synonyms => array(default => 1),
            cache_ttl => array(default => 60, sanitize_callback => absint),
        ),
    ))
})

function crs_search_permissions(request) {
    // Public search: true. For private content restrict by capability.
    return true
}

function crs_handle_search(request) {
    global wpdb

    q_raw = request->get_param(q)
    page = max(1, (int) request->get_param(page))
    per_page = max(1, min(100, (int) request->get_param(per_page)))
    use_synonyms = request->get_param(synonyms) !== 0
    cache_ttl = max(0, (int) request->get_param(cache_ttl))

    q = trim( wp_strip_all_tags( q_raw ) )
    if ( === q) {
        return new WP_REST_Response(array(total => 0, hits => array()), 200)
    }

    // Build a cache key for identical queries
    cache_key = crs_ . md5(serialize(array(q, page, per_page, use_synonyms)))
    if (cache_ttl > 0) {
        cached = get_transient(cache_key)
        if (cached !== false) {
            return rest_ensure_response(cached)
        }
    }

    // Tokenize and optionally expand synonyms
    tokens = crs_tokenize_query(q)
    if (use_synonyms) {
        expanded = crs_expand_synonyms(tokens)
    } else {
        expanded = tokens
    }

    // Build boolean-mode fulltext search string for MATCH...AGAINST
    // Use   operator for required tokens, use OR for synonyms groups
    boolean_parts = array()
    foreach (expanded as group) {
        if (is_array(group)  count(group) > 1) {
            // group: synonyms - prefer original term by placing it first
            escaped_terms = array_map(function(t){ return   . crs_mysql_escape_boolean_term(t) }, group)
            boolean_parts[] = implode( , escaped_terms)
        } else {
            term = is_array(group) ? group[0] : group
            boolean_parts[] =   . crs_mysql_escape_boolean_term(term)
        }
    }
    boolean_query = implode( , boolean_parts)

    // Build MATCH() AGAINST() SQL snippet
    // Well select the relevance score and order by it
    add_filter(posts_clauses, function(clauses) use (boolean_query, wpdb) {
        // Use boolean mode to respect   operator use MATCH on title content
        match_sql = MATCH({wpdb->posts}.post_title, {wpdb->posts}.post_content) AGAINST({boolean_query} IN BOOLEAN MODE)
        // Ensure the SELECT contains our score
        if (false === stripos(clauses[fields], crs_relevance)) {
            clauses[fields] .= , ({match_sql}) AS crs_relevance
        }
        // Add a WHERE constraint to only consider posts with some match
        clauses[where] .=  AND ({match_sql})
        // Order by our relevance score descending, fallback to post_date
        clauses[orderby] = crs_relevance DESC, {wpdb->posts}.post_date DESC
        return clauses
    })

    // Build WP_Query args
    offset = (page - 1)  per_page
    args = array(
        post_status => publish,
        posts_per_page => per_page,
        offset => offset,
        s => q, // still pass s for compatibility actual filtering is done by posts_clauses
        no_found_rows => false,
    )

    query = new WP_Query(args)

    // Collect results with score from query objects
    hits = array()
    foreach (query->posts as post) {
        // The query added crs_relevance as a selected column retrieve it from post->crs_relevance
        score = isset(post->crs_relevance) ? floatval(post->crs_relevance) : 0
        hits[] = array(
            id => (int) post->ID,
            title => get_the_title(post),
            excerpt => crs_get_excerpt_for_post(post),
            score => score,
            permalink => get_permalink(post),
        )
    }

    // Total results: use found_posts
    total = (int) query->found_posts

    result = array(
        total => total,
        page => page,
        per_page => per_page,
        hits => hits,
    )

    // Remove our filter to avoid affecting other queries
    remove_all_filters(posts_clauses)

    if (cache_ttl > 0) {
        set_transient(cache_key, result, cache_ttl)
    }

    return rest_ensure_response(result)
}

/
  Tokenize a query into words, normalizing common punctuation and lowercasing.
  Returns array of tokens.
 /
function crs_tokenize_query(q) {
    q = mb_strtolower(q, UTF-8)
    // Replace punctuation with spaces
    q = preg_replace(/[^p{L}p{N}] /u,  , q)
    parts = preg_split(/s /u, trim(q))
    tokens = array()
    foreach (parts as p) {
        if (mb_strlen(p) >= 2) { // ignore very short tokens (configurable)
            tokens[] = p
        }
    }
    return array_values(array_unique(tokens))
}

/
  Expand tokens into synonyms groups.
  Returns array where each element is either a string token or an array of synonyms for the token.
 /
function crs_expand_synonyms(tokens) {
    // Example synonyms definition. For production store in option or file.
    synonyms = array(
        car => array(automobile, auto, vehicle),
        phone => array(telephone, mobile, cellphone),
        tv => array(television),
    )

    expanded = array()
    foreach (tokens as t) {
        if (isset(synonyms[t])) {
            // put original term first so it gets priority in boolean query
            group = array_merge(array(t), synonyms[t])
            expanded[] = group
        } else {
            expanded[] = t
        }
    }
    return expanded
}

/
  Escape terms for boolean fulltext mode.
 /
function crs_mysql_escape_boolean_term(term) {
    // Remove boolean operators and special characters
    term = str_replace(array( , -, <, >, @, (, ), ~, , , ),  , term)
    term = preg_replace(/s /,  , term)
    term = trim(term)
    // escape single quotes for SQL
    term = str_replace(, , term)
    return term
}

/
  Get an excerpt safely.
 /
function crs_get_excerpt_for_post(post) {
    if (!is_object(post)) {
        post = get_post(post)
    }
    if (has_excerpt(post)) {
        return get_the_excerpt(post)
    }
    text = wp_strip_all_tags(post->post_content)
    excerpt = wp_trim_words(text, 40, ...)
    return excerpt
}

Notes on the PHP implementation

  • posts_clauses filter: We inject a MATCH() AGAINST() expression to compute crs_relevance and to filter results. The filter modifies SELECT, WHERE and ORDER BY clauses. This approach allows ranking by the MySQL fulltext score.
  • Boolean mode: Using IN BOOLEAN MODE lets us specify and – operators and group synonyms. Bear in mind that MySQL boolean mode returns 1/0 type matches for WHERE but MATCH() returns a relevance score when used in SELECT as well.
  • Escaping: When building SQL fragments you must escape user input. The example uses simple sanitization and escaping for boolean terms. In production prefer prepared statements or more robust escaping.
  • Performance: Fulltext queries are fast with proper indexes. Still cache popular queries using transients or a persistent cache (Redis/Memcached).

Example requests

Example using curl endpoint expects q parameter. If synonyms=1 it will expand.

curl -G https://example.com/wp-json/custom-search/v1/search 
  --data-urlencode q=used car 
  --data-urlencode page=1 
  --data-urlencode per_page=10 
  --data-urlencode synonyms=1

Sample JSON response

{
  total: 42,
  page: 1,
  per_page: 10,
  hits: [
    {
      id: 123,
      title: How to buy a used car,
      excerpt: Buying a used automobile requires checking the vehicle history...,
      score: 3.5423,
      permalink: https://example.com/buy-used-car
    },
    {
      id: 98,
      title: Top 10 autos for commuters,
      excerpt: These vehicles are fuel efficient and reliable...,
      score: 2.1047,
      permalink: https://example.com/top-autos
    }
  ]
}

Alternative relevance tuning

  • Boost title matches: Add a separate MATCH() against post_title multiplied by a factor and sum with content MATCH() so title hits are boosted:
    (MATCH(post_title) AGAINST(...)  3)   (MATCH(post_content) AGAINST(...)  1) AS relevance
        
  • Exact phrase boosts: Add an extra clause that gives high score when the full query appears in the title or content (LIKE or IN NATURAL MODE).
  • Field weights via meta: Use post meta numeric scores (popularity, page views) and combine them with fulltext score to produce a final ranking.

Synonym management

For production, manage synonyms via:

  • WordPress options (store an associative array or JSON), editable in a plugin settings page.
  • An external synonyms file that can be reloaded without changing code.
  • Search engine synonym files (Elasticsearch analyzers or Algolia synonyms) when moving to a dedicated search engine.

Security and abuse mitigation

  • Permissions: For public content allow anonymous access. For private content require appropriate capability checks in permission_callback.
  • Sanitization: Strip HTML, remove dangerous characters, limit token length and total characters.
  • Rate limiting: Implement per-IP or per-user rate limits using transients or an external rate-limiting service.
  • Cache: Use transients or object-cache to cache identical queries for a short TTL to reduce DB load.
  • Logging: Log slow queries and errors to help debug performance bottlenecks.

Testing and validation

  1. Test with varied queries: single word, multi-word, phrase, non-Latin scripts.
  2. Verify synonyms expand correctly and that original terms are prioritized.
  3. Test performance on real dataset sizes and monitor slow-query logs.
  4. Test pagination and ensure found_posts matches expectations.

Scaling beyond MySQL fulltext

When needing better relevance, fuzzy matching, multi-language analyzers, or large scale performance, move to a search engine:

  • Elasticsearch / OpenSearch: Full-featured scoring, synonym filters, analyzers, phrase suggestions.
  • Algolia: Hosted solution with instant relevance tuning, synonyms UI, typo tolerance.
  • ElasticPress plugin: Bridges WordPress to Elasticsearch with indexing helpers and integration.

Troubleshooting common issues

  • No results: Check fulltext index existence, tokenization and minimum word length (MySQL default ft_min_word_len often 4). Adjust ft_min_word_len or use ngram/tokenization alternatives.
  • Low relevance variance: Tune weights or combine multiple signals (title boost, recency, popularity).
  • Slow queries: Ensure indexes exist, monitor query plans, cache results, or introduce an external search engine.

Summary

This article provided a production-oriented approach to creating a WordPress REST endpoint that supports relevance-based ranking and synonym expansion using MySQL fulltext search. The implementation demonstrates registering a REST route, expanding synonyms, injecting MATCH() AGAINST() for scoring, returning a paginated JSON response, and applying caching and basic security. For larger scale or advanced linguistic features, consider Elasticsearch or Algolia integrations.



Acepto donaciones de BAT's mediante el navegador Brave 🙂



Leave a Reply

Your email address will not be published. Required fields are marked *