How to index content in Algolia/Meilisearch from WP in PHP in WordPress

Contents

Overview

This is a comprehensive, production-ready tutorial on how to index WordPress content into Algolia or Meilisearch using PHP. It covers prerequisites, installation, indexing strategies (real-time and bulk), mapping and sanitizing content, attachments and custom fields, index settings (searchable attributes, facets, ranking, synonyms), partial updates, deletion, reindexing, performance considerations, security, error handling and examples you can drop into a plugin or mu-plugin. Example code blocks are provided and labeled with the correct language for easy copy-paste.

When to use Algolia vs Meilisearch

Both are excellent search engines. High-level differences and choices:

  • Algolia: hosted SaaS, mature, fast, advanced ranking and personalization features, complex configuration, paid tiers. Great for production apps requiring features like replicas, personalized ranking and advanced typo tolerance.
  • Meilisearch: open-source, self-hostable (also available as managed cloud), simple API, developer-friendly defaults, very good for typical site search needs, easier to run on your own infrastructure. Sometimes fewer advanced features but simpler tuning.

Quick comparison

Feature Algolia Meilisearch
Hosting Hosted SaaS (Algolia) or Algolia-managed Self-host or Meilisearch Cloud
Ranking customization Advanced (custom ranking, replicas) Custom ranking, simpler rules
Faceting Yes Yes
Synonyms Yes Yes
Cost Paid tiers, generous features Free self-host cloud paid

Prerequisites

  • WordPress site with access to add plugins or mu-plugins.
  • Composer for PHP dependencies (recommended for robust clients).
  • Algolia account (Application ID and Admin API Key), or Meilisearch server URL and API key.
  • Basic PHP and WordPress development knowledge (hooks, WP_Query).

Install PHP clients

Use Composer to install official clients. From your plugin root or project directory:

composer require algolia/algoliasearch-client-php meilisearch/meilisearch-php

If you prefer only one client, require only that one. If you cannot use Composer, drop the client code in vendor or use a lightweight HTTP client to call APIs directly — but using the official clients avoids subtle API differences.

Store credentials securely

  • Do not hardcode Admin API keys in public files. Store them in wp-config.php or environment variables and read via getenv() or defined constants.
  • Provide a separate Search-only API key for front-end search requests. Use secured (signed) keys if you need restricted search (Algolia-specific).
// wp-config.php
define(ALGOLIA_APP_ID, YourAlgoliaAppId)
define(ALGOLIA_ADMIN_KEY, YourAlgoliaAdminAPIKey)
define(MEILI_HOST, http://127.0.0.1:7700)
define(MEILI_MASTER_KEY, YourMeiliMasterKey)

Initialize clients (PHP examples)

Algolia

use AlgoliaAlgoliaSearchSearchClient

function get_algolia_client() {
    appId = defined(ALGOLIA_APP_ID) ? ALGOLIA_APP_ID : getenv(ALGOLIA_APP_ID)
    apiKey = defined(ALGOLIA_ADMIN_KEY) ? ALGOLIA_ADMIN_KEY : getenv(ALGOLIA_ADMIN_KEY)
    return SearchClient::create(appId, apiKey)
}

function get_algolia_index(index_name) {
    client = get_algolia_client()
    return client->initIndex(index_name)
}

Meilisearch

use MeiliSearchClient as MeiliClient

function get_meili_client() {
    host = defined(MEILI_HOST) ? MEILI_HOST : getenv(MEILI_HOST)
    key  = defined(MEILI_MASTER_KEY) ? MEILI_MASTER_KEY : getenv(MEILI_MASTER_KEY)
    return new MeiliClient(host, key)
}

function get_meili_index(index_name) {
    client = get_meili_client()
    return client->index(index_name)
}

Designing the record (mapping)

A well-designed record improves search quality and faceting. Typical fields to include:

  • objectID / id: unique ID (post ID or custom key).
  • title, content, excerpt.
  • url, slug, post_type, status.
  • date (ISO 8601), modified.
  • author, author_id.
  • taxonomies and arrays of term names and IDs (e.g., categories, tags).
  • acf / meta: only include fields you expect to search or facet on.
  • featured_image URL, image alt text.
  • locale or language code for multi-language sites.

Example record builder

function build_post_record( post_id ) {
    post = get_post(post_id)
    if (!post) return null

    // Basic filters and sanitization
    title = get_the_title(post_id)
    content = apply_filters(the_content, post->post_content)
    content = wp_strip_all_tags( content )
    excerpt = get_the_excerpt(post_id)

    // Taxonomies
    taxonomies = array()
    registered = get_post_taxonomies(post)
    foreach (registered as tax) {
        terms = wp_get_post_terms(post_id, tax, array(fields => names))
        taxonomies[tax] = terms
    }

    // Meta fields (whitelist only)
    meta_whitelist = array(price, duration) // example
    meta = array()
    foreach (meta_whitelist as key) {
        value = get_post_meta(post_id, key, true)
        if (value !== ) {
            meta[key] = maybe_unserialize(value)
        }
    }

    // Featured image
    featured = get_the_post_thumbnail_url(post_id, full)

    record = array(
        objectID => (string) post_id, // Algolia expects objectID string
        id => post_id,
        title => title,
        content => content,
        excerpt => excerpt,
        url => get_permalink(post_id),
        slug => post->post_name,
        post_type => post->post_type,
        status => post->post_status,
        author_id => post->post_author,
        author => get_the_author_meta(display_name, post->post_author),
        date => get_post_time(c, true, post_id),
        modified => get_post_modified_time(c, true, post_id),
        taxonomies => taxonomies,
        meta => meta,
        featured_image => featured,
        comment_count => post->comment_count,
    )

    return record
}

Indexing on post save (real-time)

Use the WordPress save_post hook for near real-time indexing. Key considerations:

  • Ignore autosaves, revisions and post types you do not index.
  • On unpublish/trashed/delete remove the record from the index.
  • Use background processing (Action Scheduler, WP Cron or a queue) to avoid blocking page loads.

Algolia: index on save

add_action(save_post, algolia_index_post, 10, 3)
function algolia_index_post(post_id, post, update) {
    // Basic checks
    if ( defined(DOING_AUTOSAVE)  DOING_AUTOSAVE ) return
    if ( wp_is_post_revision(post_id) ) return
    if ( post->post_type !== post ) return // only index posts example

    // If not published, remove from index
    if ( post->post_status !== publish ) {
        try {
            index = get_algolia_index(wp_posts)
            index->deleteObject((string)post_id)
        } catch (Exception e) {
            error_log(Algolia delete error:  . e->getMessage())
        }
        return
    }

    record = build_post_record(post_id)
    if (!record) return

    // Send to Algolia (non-blocking production: queue this)
    try {
        index = get_algolia_index(wp_posts)
        index->saveObjects( array(record) ) // batch of one
    } catch (Exception e) {
        error_log(Algolia save error:  . e->getMessage())
    }
}

Meilisearch: index on save

add_action(save_post, meili_index_post, 10, 3)
function meili_index_post(post_id, post, update) {
    if ( defined(DOING_AUTOSAVE)  DOING_AUTOSAVE ) return
    if ( wp_is_post_revision(post_id) ) return
    if ( post->post_type !== post ) return

    if ( post->post_status !== publish ) {
        try {
            index = get_meili_index(wp_posts)
            index->deleteDocument(post_id)
        } catch (Exception e) {
            error_log(Meili delete error:  . e->getMessage())
        }
        return
    }

    record = build_post_record(post_id)
    if (!record) return

    try {
        index = get_meili_index(wp_posts)
        index->addDocuments(array(record)) // upsert
    } catch (Exception e) {
        error_log(Meili save error:  . e->getMessage())
    }
}

Bulk indexing (initial reindex or re-sync)

For large sites, perform bulk reindexing in the background and in chunks. Both Algolia and Meilisearch support batch adds (up to 1000 per request in typical scenarios). Use chunking, rate-limit/backoff, and background workers to avoid timeouts.

Batch index example (WP-CLI friendly)

// Usage via WP-CLI: wp eval-file reindex.php
chunk_size = 500
paged = 1

while (true) {
    query = new WP_Query(array(
        post_type => post,
        post_status => publish,
        posts_per_page => chunk_size,
        paged => paged,
        fields => ids,
    ))
    if ( empty(query->posts) ) break

    records = array()
    foreach (query->posts as post_id) {
        rec = build_post_record(post_id)
        if (rec) records[] = rec
    }

    // Send to Algolia or Meili (example for Algolia)
    try {
        index = get_algolia_index(wp_posts)
        index->saveObjects(records)
    } catch (Exception e) {
        error_log(Bulk index error:  . e->getMessage())
        // handle retry/backoff
    }

    paged  
    // Optional sleep to respect rate limits: usleep(200000) // 0.2s
}

Partial updates and upserts

  • Algolia: partialUpdateObjects or saveObjects — partialUpdateObjects will change only specified attributes.
  • Meilisearch: addDocuments is an upsert (replace whole doc with same primary key). To partially update fields, fetch, modify and re-add, or store minimal fields overwritten.
// Algolia partial update
index = get_algolia_index(wp_posts)
partial = array(objectID => (string)post_id, title => Updated title only)
index->partialUpdateObjects(array(partial))

Index settings (searchableAttributes, facets, ranking)

Set index settings to control searchable attributes, facets, custom ranking, stop words and synonyms.

Algolia example settings

index = get_algolia_index(wp_posts)
index->setSettings(array(
    searchableAttributes => array(title, content, excerpt, meta.price),
    attributesForFaceting => array(filterOnly(post_type), searchable(tags), categories),
    customRanking => array(desc(date), desc(comment_count)),
    attributesToRetrieve => array(title,url,excerpt,featured_image),
    removeStopWords => array(en),
    ignorePlurals => true,
))

Meilisearch example settings

index = get_meili_index(wp_posts)
index->updateSettings([
    searchableAttributes => [title, content, excerpt, meta.price],
    filterableAttributes => [post_type, categories, tags],
    rankingRules => [typo, words, proximity, attribute, wordsPosition, exactness],
    stopWords => [the,and,a],
    distinctAttribute => id // if you want dedup
])

Synonyms and stop-words

Both engines allow synonyms and stop-words. Maintain synonyms for domain-specific terms (e.g., tv=>television). Use the client API to push synonyms in JSON or programmatically.

Faceting and filtering

Expose attributes you want to facet or filter on (categories, price ranges, authors). For Algolia add to attributesForFaceting. For Meili set filterableAttributes. Keep numeric ranges as numeric fields for efficient numeric filtering.

Geo-search

If you need proximity search, include a lat-lng coordinate in the record and use Algolias _geoloc field and Meilisearchs geo location features (if enabled). Example for Algolia:

record[_geoloc] = array(lat => 40.7128, lng => -74.0060)
index->saveObjects(array(record))

Handling attachments, images and media

  • Store image URLs and alt text in the record. Do not index binary blob data.
  • For performance, only index media metadata you will search/filter on.

Multi-language support

  • Option A: create per-locale indices (wp_posts_en, wp_posts_fr). This gives language-specific analyzers and settings.
  • Option B: single index with locale field and use filters to restrict search to a locale.

Algolia has features like ignorePlurals and language-specific tokenization. Meilisearch works well with per-index languages or per-document locale fields.

Error handling, retries and idempotency

  • Wrap API calls in try/catch and log failures to error_log or a custom logging system.
  • Make indexing operations idempotent by using stable keys (post ID or slug) so retries do not duplicate data.
  • Implement exponential backoff for transient HTTP or rate-limit errors.
function send_with_retry(closure, retries = 3) {
    attempt = 0
    backoff = 200 // ms
    while (attempt <= retries) {
        try {
            return closure()
        } catch (Exception e) {
            attempt  
            if (attempt > retries) {
                error_log(Indexing final failure:  . e->getMessage())
                return false
            }
            usleep(backoff  1000)
            backoff = 2
        }
    }
    return false
}

Background processing recommendations

  • Do not perform heavy indexing on the request thread. Use Action Scheduler, WP-Background-Processing library, a job queue, or WP Cron with small batches.
  • Enqueue single-post updates on save_post and let a background worker flush to the search engine.
  • For bulk reindexing, use wp-cli run offline or a scheduled background job.

Removing, de-duplicating and deleting records

Always delete object from index on post deletion or when status changes to private/draft depending on your needs.

add_action(before_delete_post, search_remove_post)
function search_remove_post(post_id) {
    try {
        algolia = get_algolia_index(wp_posts)
        algolia->deleteObject((string)post_id)
    } catch (Exception e) {
        error_log(Algolia remove error:  . e->getMessage())
    }

    try {
        meili = get_meili_index(wp_posts)
        meili->deleteDocument(post_id)
    } catch (Exception e) {
        error_log(Meili remove error:  . e->getMessage())
    }
}

Security best practices

  • Do not expose Admin API keys to the client. Generate search-only keys for front-end usage.
  • Algolia: use Admin API Key server-side only. Create secured API keys if you need restricted searches generate them server-side and deliver to client when needed.
  • Meilisearch: create a dedicated API key with limited scope for search if your Meili instance supports scoped keys.
  • Store keys in environment variables or wp-config.php constants and avoid committing them to Git.

Index tuning checklist

  • Set searchable attributes in order of importance.
  • Set attributesForFaceting / filterableAttributes for filterable fields.
  • Set custom ranking to promote recent/popular content.
  • Enable or disable typo tolerance and advanced syntax as needed.
  • Configure synonyms and stop words for domain-specific vocabulary.
  • For multi-lingual sites prefer per-locale indices.

Troubleshooting common issues

  • Records not appearing: Confirm the index name, API keys, and that you call saveObjects/addDocuments successfully. Check asynchronous tasks and wait for task completion if applicable.
  • Slow indexing: Reduce chunk size, use background processing, avoid heavy serialization, and remove non-critical fields from records.
  • Rate-limited: Implement a backoff and respect provider rate limits. Algolia provides rate-limit headers.
  • Duplicate records: Ensure objectID/primary key is stable and unique per record.

Example: Full minimal plugin skeleton

Drop this into an mu-plugin or plugin file and adapt to your needs (this example assumes composer autoloading and constants defined).


Monitoring and metrics

  • Log indexing durations and failures.
  • Monitor index size and duplicate document counts.
  • Watch API usage and plan limits (Algolia) or host resource usage (Meilisearch).

Advanced features to explore

  • Incremental updates for large content stores — save last-indexed timestamps and only update changed posts.
  • Personalization and user-based ranking (Algolia built-ins).
  • Search analytics and click tracking (Algolia insights).
  • Multi-index strategies for separating pages, posts, products.
  • Synonym management UI and automated synonyms from analytics.

Useful links

Summary checklist before going to production

  1. Keep admin API keys server-side only and create search-only keys for clients.
  2. Implement background processing for bulk operations and save_post actions.
  3. Design records carefully: index only searchable/filterable fields.
  4. Test index settings (searchableAttributes, facets, ranking) in staging before production.
  5. Monitor and handle rate limits and failures with retries and logging.


Acepto donaciones de BAT's mediante el navegador Brave :)



Leave a Reply

Your email address will not be published. Required fields are marked *