Contents
Overview
This is a comprehensive, production-ready tutorial on how to index WordPress content into Algolia or Meilisearch using PHP. It covers prerequisites, installation, indexing strategies (real-time and bulk), mapping and sanitizing content, attachments and custom fields, index settings (searchable attributes, facets, ranking, synonyms), partial updates, deletion, reindexing, performance considerations, security, error handling and examples you can drop into a plugin or mu-plugin. Example code blocks are provided and labeled with the correct language for easy copy-paste.
When to use Algolia vs Meilisearch
Both are excellent search engines. High-level differences and choices:
- Algolia: hosted SaaS, mature, fast, advanced ranking and personalization features, complex configuration, paid tiers. Great for production apps requiring features like replicas, personalized ranking and advanced typo tolerance.
- Meilisearch: open-source, self-hostable (also available as managed cloud), simple API, developer-friendly defaults, very good for typical site search needs, easier to run on your own infrastructure. Sometimes fewer advanced features but simpler tuning.
Quick comparison
Feature | Algolia | Meilisearch |
---|---|---|
Hosting | Hosted SaaS (Algolia) or Algolia-managed | Self-host or Meilisearch Cloud |
Ranking customization | Advanced (custom ranking, replicas) | Custom ranking, simpler rules |
Faceting | Yes | Yes |
Synonyms | Yes | Yes |
Cost | Paid tiers, generous features | Free self-host cloud paid |
Prerequisites
- WordPress site with access to add plugins or mu-plugins.
- Composer for PHP dependencies (recommended for robust clients).
- Algolia account (Application ID and Admin API Key), or Meilisearch server URL and API key.
- Basic PHP and WordPress development knowledge (hooks, WP_Query).
Install PHP clients
Use Composer to install official clients. From your plugin root or project directory:
composer require algolia/algoliasearch-client-php meilisearch/meilisearch-php
If you prefer only one client, require only that one. If you cannot use Composer, drop the client code in vendor or use a lightweight HTTP client to call APIs directly — but using the official clients avoids subtle API differences.
Store credentials securely
- Do not hardcode Admin API keys in public files. Store them in wp-config.php or environment variables and read via getenv() or defined constants.
- Provide a separate Search-only API key for front-end search requests. Use secured (signed) keys if you need restricted search (Algolia-specific).
// wp-config.php define(ALGOLIA_APP_ID, YourAlgoliaAppId) define(ALGOLIA_ADMIN_KEY, YourAlgoliaAdminAPIKey) define(MEILI_HOST, http://127.0.0.1:7700) define(MEILI_MASTER_KEY, YourMeiliMasterKey)
Initialize clients (PHP examples)
Algolia
use AlgoliaAlgoliaSearchSearchClient function get_algolia_client() { appId = defined(ALGOLIA_APP_ID) ? ALGOLIA_APP_ID : getenv(ALGOLIA_APP_ID) apiKey = defined(ALGOLIA_ADMIN_KEY) ? ALGOLIA_ADMIN_KEY : getenv(ALGOLIA_ADMIN_KEY) return SearchClient::create(appId, apiKey) } function get_algolia_index(index_name) { client = get_algolia_client() return client->initIndex(index_name) }
Meilisearch
use MeiliSearchClient as MeiliClient function get_meili_client() { host = defined(MEILI_HOST) ? MEILI_HOST : getenv(MEILI_HOST) key = defined(MEILI_MASTER_KEY) ? MEILI_MASTER_KEY : getenv(MEILI_MASTER_KEY) return new MeiliClient(host, key) } function get_meili_index(index_name) { client = get_meili_client() return client->index(index_name) }
Designing the record (mapping)
A well-designed record improves search quality and faceting. Typical fields to include:
- objectID / id: unique ID (post ID or custom key).
- title, content, excerpt.
- url, slug, post_type, status.
- date (ISO 8601), modified.
- author, author_id.
- taxonomies and arrays of term names and IDs (e.g., categories, tags).
- acf / meta: only include fields you expect to search or facet on.
- featured_image URL, image alt text.
- locale or language code for multi-language sites.
Example record builder
function build_post_record( post_id ) { post = get_post(post_id) if (!post) return null // Basic filters and sanitization title = get_the_title(post_id) content = apply_filters(the_content, post->post_content) content = wp_strip_all_tags( content ) excerpt = get_the_excerpt(post_id) // Taxonomies taxonomies = array() registered = get_post_taxonomies(post) foreach (registered as tax) { terms = wp_get_post_terms(post_id, tax, array(fields => names)) taxonomies[tax] = terms } // Meta fields (whitelist only) meta_whitelist = array(price, duration) // example meta = array() foreach (meta_whitelist as key) { value = get_post_meta(post_id, key, true) if (value !== ) { meta[key] = maybe_unserialize(value) } } // Featured image featured = get_the_post_thumbnail_url(post_id, full) record = array( objectID => (string) post_id, // Algolia expects objectID string id => post_id, title => title, content => content, excerpt => excerpt, url => get_permalink(post_id), slug => post->post_name, post_type => post->post_type, status => post->post_status, author_id => post->post_author, author => get_the_author_meta(display_name, post->post_author), date => get_post_time(c, true, post_id), modified => get_post_modified_time(c, true, post_id), taxonomies => taxonomies, meta => meta, featured_image => featured, comment_count => post->comment_count, ) return record }
Indexing on post save (real-time)
Use the WordPress save_post hook for near real-time indexing. Key considerations:
- Ignore autosaves, revisions and post types you do not index.
- On unpublish/trashed/delete remove the record from the index.
- Use background processing (Action Scheduler, WP Cron or a queue) to avoid blocking page loads.
Algolia: index on save
add_action(save_post, algolia_index_post, 10, 3) function algolia_index_post(post_id, post, update) { // Basic checks if ( defined(DOING_AUTOSAVE) DOING_AUTOSAVE ) return if ( wp_is_post_revision(post_id) ) return if ( post->post_type !== post ) return // only index posts example // If not published, remove from index if ( post->post_status !== publish ) { try { index = get_algolia_index(wp_posts) index->deleteObject((string)post_id) } catch (Exception e) { error_log(Algolia delete error: . e->getMessage()) } return } record = build_post_record(post_id) if (!record) return // Send to Algolia (non-blocking production: queue this) try { index = get_algolia_index(wp_posts) index->saveObjects( array(record) ) // batch of one } catch (Exception e) { error_log(Algolia save error: . e->getMessage()) } }
Meilisearch: index on save
add_action(save_post, meili_index_post, 10, 3) function meili_index_post(post_id, post, update) { if ( defined(DOING_AUTOSAVE) DOING_AUTOSAVE ) return if ( wp_is_post_revision(post_id) ) return if ( post->post_type !== post ) return if ( post->post_status !== publish ) { try { index = get_meili_index(wp_posts) index->deleteDocument(post_id) } catch (Exception e) { error_log(Meili delete error: . e->getMessage()) } return } record = build_post_record(post_id) if (!record) return try { index = get_meili_index(wp_posts) index->addDocuments(array(record)) // upsert } catch (Exception e) { error_log(Meili save error: . e->getMessage()) } }
Bulk indexing (initial reindex or re-sync)
For large sites, perform bulk reindexing in the background and in chunks. Both Algolia and Meilisearch support batch adds (up to 1000 per request in typical scenarios). Use chunking, rate-limit/backoff, and background workers to avoid timeouts.
Batch index example (WP-CLI friendly)
// Usage via WP-CLI: wp eval-file reindex.php chunk_size = 500 paged = 1 while (true) { query = new WP_Query(array( post_type => post, post_status => publish, posts_per_page => chunk_size, paged => paged, fields => ids, )) if ( empty(query->posts) ) break records = array() foreach (query->posts as post_id) { rec = build_post_record(post_id) if (rec) records[] = rec } // Send to Algolia or Meili (example for Algolia) try { index = get_algolia_index(wp_posts) index->saveObjects(records) } catch (Exception e) { error_log(Bulk index error: . e->getMessage()) // handle retry/backoff } paged // Optional sleep to respect rate limits: usleep(200000) // 0.2s }
Partial updates and upserts
- Algolia: partialUpdateObjects or saveObjects — partialUpdateObjects will change only specified attributes.
- Meilisearch: addDocuments is an upsert (replace whole doc with same primary key). To partially update fields, fetch, modify and re-add, or store minimal fields overwritten.
// Algolia partial update index = get_algolia_index(wp_posts) partial = array(objectID => (string)post_id, title => Updated title only) index->partialUpdateObjects(array(partial))
Index settings (searchableAttributes, facets, ranking)
Set index settings to control searchable attributes, facets, custom ranking, stop words and synonyms.
Algolia example settings
index = get_algolia_index(wp_posts) index->setSettings(array( searchableAttributes => array(title, content, excerpt, meta.price), attributesForFaceting => array(filterOnly(post_type), searchable(tags), categories), customRanking => array(desc(date), desc(comment_count)), attributesToRetrieve => array(title,url,excerpt,featured_image), removeStopWords => array(en), ignorePlurals => true, ))
Meilisearch example settings
index = get_meili_index(wp_posts) index->updateSettings([ searchableAttributes => [title, content, excerpt, meta.price], filterableAttributes => [post_type, categories, tags], rankingRules => [typo, words, proximity, attribute, wordsPosition, exactness], stopWords => [the,and,a], distinctAttribute => id // if you want dedup ])
Synonyms and stop-words
Both engines allow synonyms and stop-words. Maintain synonyms for domain-specific terms (e.g., tv=>television). Use the client API to push synonyms in JSON or programmatically.
Faceting and filtering
Expose attributes you want to facet or filter on (categories, price ranges, authors). For Algolia add to attributesForFaceting. For Meili set filterableAttributes. Keep numeric ranges as numeric fields for efficient numeric filtering.
Geo-search
If you need proximity search, include a lat-lng coordinate in the record and use Algolias _geoloc field and Meilisearchs geo location features (if enabled). Example for Algolia:
record[_geoloc] = array(lat => 40.7128, lng => -74.0060) index->saveObjects(array(record))
Handling attachments, images and media
- Store image URLs and alt text in the record. Do not index binary blob data.
- For performance, only index media metadata you will search/filter on.
Multi-language support
- Option A: create per-locale indices (wp_posts_en, wp_posts_fr). This gives language-specific analyzers and settings.
- Option B: single index with locale field and use filters to restrict search to a locale.
Algolia has features like ignorePlurals and language-specific tokenization. Meilisearch works well with per-index languages or per-document locale fields.
Error handling, retries and idempotency
- Wrap API calls in try/catch and log failures to error_log or a custom logging system.
- Make indexing operations idempotent by using stable keys (post ID or slug) so retries do not duplicate data.
- Implement exponential backoff for transient HTTP or rate-limit errors.
function send_with_retry(closure, retries = 3) { attempt = 0 backoff = 200 // ms while (attempt <= retries) { try { return closure() } catch (Exception e) { attempt if (attempt > retries) { error_log(Indexing final failure: . e->getMessage()) return false } usleep(backoff 1000) backoff = 2 } } return false }
Background processing recommendations
- Do not perform heavy indexing on the request thread. Use Action Scheduler, WP-Background-Processing library, a job queue, or WP Cron with small batches.
- Enqueue single-post updates on save_post and let a background worker flush to the search engine.
- For bulk reindexing, use wp-cli run offline or a scheduled background job.
Removing, de-duplicating and deleting records
Always delete object from index on post deletion or when status changes to private/draft depending on your needs.
add_action(before_delete_post, search_remove_post) function search_remove_post(post_id) { try { algolia = get_algolia_index(wp_posts) algolia->deleteObject((string)post_id) } catch (Exception e) { error_log(Algolia remove error: . e->getMessage()) } try { meili = get_meili_index(wp_posts) meili->deleteDocument(post_id) } catch (Exception e) { error_log(Meili remove error: . e->getMessage()) } }
Security best practices
- Do not expose Admin API keys to the client. Generate search-only keys for front-end usage.
- Algolia: use Admin API Key server-side only. Create secured API keys if you need restricted searches generate them server-side and deliver to client when needed.
- Meilisearch: create a dedicated API key with limited scope for search if your Meili instance supports scoped keys.
- Store keys in environment variables or wp-config.php constants and avoid committing them to Git.
Index tuning checklist
- Set searchable attributes in order of importance.
- Set attributesForFaceting / filterableAttributes for filterable fields.
- Set custom ranking to promote recent/popular content.
- Enable or disable typo tolerance and advanced syntax as needed.
- Configure synonyms and stop words for domain-specific vocabulary.
- For multi-lingual sites prefer per-locale indices.
Troubleshooting common issues
- Records not appearing: Confirm the index name, API keys, and that you call saveObjects/addDocuments successfully. Check asynchronous tasks and wait for task completion if applicable.
- Slow indexing: Reduce chunk size, use background processing, avoid heavy serialization, and remove non-critical fields from records.
- Rate-limited: Implement a backoff and respect provider rate limits. Algolia provides rate-limit headers.
- Duplicate records: Ensure objectID/primary key is stable and unique per record.
Example: Full minimal plugin skeleton
Drop this into an mu-plugin or plugin file and adapt to your needs (this example assumes composer autoloading and constants defined).
Monitoring and metrics
- Log indexing durations and failures.
- Monitor index size and duplicate document counts.
- Watch API usage and plan limits (Algolia) or host resource usage (Meilisearch).
Advanced features to explore
- Incremental updates for large content stores — save last-indexed timestamps and only update changed posts.
- Personalization and user-based ranking (Algolia built-ins).
- Search analytics and click tracking (Algolia insights).
- Multi-index strategies for separating pages, posts, products.
- Synonym management UI and automated synonyms from analytics.
Useful links
Summary checklist before going to production
- Keep admin API keys server-side only and create search-only keys for clients.
- Implement background processing for bulk operations and save_post actions.
- Design records carefully: index only searchable/filterable fields.
- Test index settings (searchableAttributes, facets, ranking) in staging before production.
- Monitor and handle rate limits and failures with retries and logging.
|
Acepto donaciones de BAT's mediante el navegador Brave :) |