How to create an importer from JSON with validation in PHP in WordPress

Contents

Introduction

This article is a complete, detailed tutorial on how to build a robust WordPress importer that reads data from JSON, validates it, and turns it into posts (or custom post types), taxonomies, attachments and metadata using best practices in security, performance and error handling. The article includes multiple implementation approaches, a production-ready importer class, a REST endpoint, a WP-CLI command example, strategies for validation (manual and JSON Schema via a library), and code samples you can drop into a plugin.

Goals and features

  • Input: JSON file or JSON payload (REST or CLI).
  • Validation: Validate JSON shape and field values (types, required, enumerations, date formats, URLs).
  • Sanitization: Sanitize all inputs before inserting/updating.
  • Mapping: Map JSON keys to post fields, taxonomies and meta keys.
  • Attachments: Download remote images and attach as featured images or gallery items.
  • Transactions and Rollback: Graceful rollback (best-effort) and reporting.
  • Interfaces: REST endpoint with capability checks and a WP-CLI command for bulk imports.
  • Extensibility: Hooks and filters for custom mapping/validation.

When to use this and security considerations

  • Use the importer for batch creation/updating of content where data is supplied via JSON from external systems.
  • Do not accept large or untrusted JSON without strict validation and user capability checks.
  • Only users with appropriate capabilities should trigger the importer (e.g., manage_options, edit_posts, or a custom cap).
  • When downloading remote files, use safe functions (wp_remote_get, media_sideload_image alternatives) and limit accepted mime types and file sizes.

Design and data format

Define a clear JSON input contract. Example JSON for a list of posts:

{
  source: external-cms,
  imported_at: 2025-09-26T12:00:00Z,
  posts: [
    {
      external_id: abc123,
      post_type: post,
      title: Sample Post,
      content: 

HTML content

, excerpt: Short excerpt, status: publish, date: 2025-09-20T08:00:00Z, author_external_id: user_1, categories: [news, world], tags: [import, json], featured_image: https://example.com/image.jpg, meta: { source_rating: 4.5, external_flag: true } } ] }

Recommended JSON Schema (concept)

Use a JSON Schema to describe allowed fields, types, required fields and formats. The schema below is simplified production schemas have additional constraints and enumerations.

{
  schema: http://json-schema.org/draft-07/schema#,
  type: object,
  required: [posts],
  properties: {
    source: { type: string },
    imported_at: { type: string, format: date-time },
    posts: {
      type: array,
      items: {
        type: object,
        required: [external_id, title],
        properties: {
          external_id: { type: string },
          post_type: { type: string },
          title: { type: string },
          content: { type: string },
          excerpt: { type: string },
          status: { type: string, enum: [publish, draft, private, pending] },
          date: { type: [string, null], format: date-time },
          author_external_id: { type: [string, null] },
          categories: { type: array, items: { type: string } },
          tags: { type: array, items: { type: string } },
          featured_image: { type: [string, null], format: uri },
          meta: { type: object }
        }
      }
    }
  }
}

Validation strategies

1) Manual validation

Fast and dependency-free: decode the JSON with json_decode and run checks for required keys, types and value bounds. Manual validation is easiest to deploy inside WordPress without Composer dependencies. Advantages: full control, simple to debug. Disadvantages: you must implement every rule yourself.

2) JSON Schema validator libraries

Use a library like opis/json-schema or justinrainbow/json-schema if you prefer declarative schema validation. These libraries give you automated validation and rich error messages. To use them, install via Composer and include vendor/autoload.php in your plugin.

  • Composer install example: composer require opis/json-schema
  • Then validate decoded JSON against schema and respond with structured errors.

Which to choose?

– Small imports: manual validation works fine.
– Complex schemas with many constraints: prefer a JSON Schema library.

Core importer implementation (production-ready)

The following example shows a self-contained importer class that:

  • Accepts JSON (string or array).
  • Performs manual validation (example approach).
  • Creates/updates posts, taxonomies and attachments.
  • Reports successes and errors and attempts rollback on failure.
import_from_json_string( json_string )
  // result contains arrays: created, updated, errors
 /
class WP_JSON_Importer {

    protected mapping = array() // future extension for field mapping
    protected created_ids = array() // track created post/attachment IDs for rollback
    protected report = array(
        created => array(),
        updated => array(),
        skipped => array(),
        errors => array(),
    )

    protected max_image_size = 5  1024  1024 // 5MB max image

    public function import_from_json_string( json_string ) {
        data = json_decode( json_string, true )
        if ( null === data ) {
            this->report[errors][] = array( message => Invalid JSON:  . json_last_error_msg() )
            return this->report
        }
        return this->import_from_array( data )
    }

    public function import_from_array( array data ) {
        // Basic top-level validation
        if ( ! isset( data[posts] )  ! is_array( data[posts] ) ) {
            this->report[errors][] = array( message => Missing or invalid posts array. )
            return this->report
        }

        // Loop through posts
        foreach ( data[posts] as index => item ) {
            try {
                this->process_item( item, index )
            } catch ( Exception e ) {
                this->report[errors][] = array(
                    index => index,
                    message => Exception:  . e->getMessage(),
                )
                // continue with next items (do not throw further by default)
            }
        }

        return this->report
    }

    protected function process_item( array item, index = 0 ) {
        // Basic item validation
        if ( empty( item[external_id] ) ) {
            this->report[errors][] = compact( index )   array( message => Missing external_id )
            return
        }
        if ( empty( item[title] ) ) {
            this->report[errors][] = compact( index )   array( message => Missing title )
            return
        }

        // Determine post_type
        post_type = ! empty( item[post_type] ) ? sanitize_key( item[post_type] ) : post
        if ( ! post_type_exists( post_type ) ) {
            this->report[errors][] = compact( index )   array( message => Invalid post_type:  . post_type )
            return
        }

        // Map inputs into postarr
        postarr = array(
            post_title   => wp_strip_all_tags( item[title] ),
            post_content => isset( item[content] ) ? wp_kses_post( item[content] ) : ,
            post_excerpt => isset( item[excerpt] ) ? wp_strip_all_tags( item[excerpt] ) : ,
            post_status  => isset( item[status] ) ? sanitize_key( item[status] ) : draft,
            post_type    => post_type,
        )

        if ( ! empty( item[date] ) ) {
            date = date_create( item[date] )
            if ( date ) {
                postarr[post_date] = date->format( Y-m-d H:i:s )
            } else {
                this->report[errors][] = compact( index )   array( message => Invalid date format )
            }
        }

        // Check if a post already exists with this external_id
        existing_post_id = this->find_post_by_external_id( item[external_id], post_type )

        if ( existing_post_id ) {
            // Update path
            postarr[ID] = existing_post_id
            updated_id = wp_update_post( postarr, true )
            if ( is_wp_error( updated_id ) ) {
                this->report[errors][] = compact( index )   array( message => WP update error:  . updated_id->get_error_message() )
                return
            }
            this->report[updated][] = updated_id
            post_id = updated_id
        } else {
            // Create path
            new_id = wp_insert_post( postarr, true )
            if ( is_wp_error( new_id ) ) {
                this->report[errors][] = compact( index )   array( message => WP insert error:  . new_id->get_error_message() )
                return
            }
            this->report[created][] = new_id
            this->created_ids[] = new_id
            post_id = new_id

            // Store external_id mapping, using postmeta (or a custom table)
            update_post_meta( post_id, _import_external_id, sanitize_text_field( item[external_id] ) )
        }

        // Handle author mapping (optional)
        if ( ! empty( item[author_external_id] ) ) {
            author_id = this->map_author_by_external_id( item[author_external_id] )
            if ( author_id ) {
                wp_update_post( array( ID => post_id, post_author => author_id ) )
            }
        }

        // Handle taxonomies
        if ( ! empty( item[categories] )  is_array( item[categories] ) ) {
            this->set_terms_safe( post_id, item[categories], category )
        }
        if ( ! empty( item[tags] )  is_array( item[tags] ) ) {
            this->set_terms_safe( post_id, item[tags], post_tag )
        }

        // Handle meta
        if ( ! empty( item[meta] )  is_array( item[meta] ) ) {
            foreach ( item[meta] as meta_key => meta_value ) {
                this->save_meta( post_id, meta_key, meta_value )
            }
        }

        // Handle featured image (download  attach)
        if ( ! empty( item[featured_image] ) ) {
            attachment_id = this->attach_remote_image( esc_url_raw( item[featured_image] ), post_id )
            if ( is_wp_error( attachment_id ) ) {
                this->report[errors][] = compact( index )   array( message => Image error:  . attachment_id->get_error_message() )
            } elseif ( attachment_id ) {
                set_post_thumbnail( post_id, attachment_id )
            }
        }

        // Additional hooks for extensibility
        do_action( wp_json_importer_after_item, post_id, item )
    }

    protected function find_post_by_external_id( external_id, post_type = post ) {
        args = array(
            post_type  => post_type,
            meta_key   => _import_external_id,
            meta_value => sanitize_text_field( external_id ),
            fields     => ids,
            posts_per_page => 1,
        )
        posts = get_posts( args )
        if ( ! empty( posts ) ) {
            return (int) posts[0]
        }
        return 0
    }

    protected function map_author_by_external_id( external_id ) {
        // Example: map to existing user by meta key external_id
        user_query = new WP_User_Query( array(
            meta_key   => external_id,
            meta_value => sanitize_text_field( external_id ),
            number     => 1,
            fields     => ID,
        ) )
        users = user_query->get_results()
        if ( ! empty( users ) ) {
            return (int) users[0]
        }
        return 0
    }

    protected function set_terms_safe( post_id, terms, taxonomy ) {
        if ( ! taxonomy_exists( taxonomy ) ) {
            return
        }
        sanitized = array()
        foreach ( terms as t ) {
            sanitized[] = wp_strip_all_tags( t )
        }
        // wp_set_object_terms will create terms if they do not exist
        wp_set_object_terms( post_id, sanitized, taxonomy, false )
    }

    protected function save_meta( post_id, meta_key, meta_value ) {
        // Allow a whitelist of meta keys if required otherwise sanitize generically
        meta_key = sanitize_key( meta_key )
        // For arrays or objects, serialize or JSON-encode, depending on your use
        if ( is_array( meta_value )  is_object( meta_value ) ) {
            update_post_meta( post_id, meta_key, wp_json_encode( meta_value ) )
        } else {
            update_post_meta( post_id, meta_key, sanitize_text_field( (string) meta_value ) )
        }
    }

    protected function attach_remote_image( image_url, post_id ) {
        // Basic check
        if ( filter_var( image_url, FILTER_VALIDATE_URL ) === false ) {
            return new WP_Error( invalid_url, Invalid image URL )
        }
        if ( ! function_exists( media_handle_sideload ) ) {
            require_once ABSPATH . wp-admin/includes/media.php
            require_once ABSPATH . wp-admin/includes/file.php
            require_once ABSPATH . wp-admin/includes/image.php
        }

        // Download file
        tmp = download_url( image_url )
        if ( is_wp_error( tmp ) ) {
            return tmp
        }

        // Check file size
        size = filesize( tmp )
        if ( size > this->max_image_size ) {
            @unlink( tmp )
            return new WP_Error( image_too_large, Image exceeds max size. )
        }

        // Build file array
        file_array = array()
        file_array[name] = wp_basename( parse_url( image_url, PHP_URL_PATH ) )
        file_array[tmp_name] = tmp

        // Let WP handle the upload and attachment creation
        attachment_id = media_handle_sideload( file_array, post_id )

        if ( is_wp_error( attachment_id ) ) {
            @unlink( tmp )
            return attachment_id
        }

        // Track created attachments for rollback
        this->created_ids[] = attachment_id

        return attachment_id
    }

    /
      Optional rollback routine: attempt to remove created posts and attachments.
      This is best-effort: some changes (e.g., external API calls) cannot be reversed.
     /
    public function rollback() {
        foreach ( this->created_ids as id ) {
            // Remove attachment specially to delete files
            if ( wp_attachment_is_image( id )  get_post_type( id ) === attachment ) {
                wp_delete_attachment( id, true )
            } else {
                wp_delete_post( id, true )
            }
        }
        this->created_ids = array()
        this->report[errors][] = array( message => Rollback executed )
    }
}
?>

Notes about the importer class

  • It is intentionally conservative: it validates top-level structure and critical fields, sanitizes values and uses core WP functions.
  • Attachments are created using media_handle_sideload which takes care of file moving and metadata.
  • find_post_by_external_id uses postmeta for large imports use a custom indexed table for better performance.
  • Rollback is best-effort WP core functions are not transactional, so complete atomic rollback is hard. For custom tables you can use database transactions via wpdb.
  • Use hooks like do_action(wp_json_importer_after_item, post_id, item) to add custom behavior.

REST API endpoint

Expose the importer via a REST route to accept JSON payloads. Ensure you enforce capability checks and nonce protections for AJAX requests from the admin UI.

 POST,
        callback            => wp_json_importer_rest_callback,
        permission_callback => function ( request ) {
            // Example: only authenticated users with edit_posts
            return current_user_can( edit_posts )
        },
    ) )
} )

function wp_json_importer_rest_callback( WP_REST_Request request ) {
    body = request->get_body()
    importer = new WP_JSON_Importer()

    result = importer->import_from_json_string( body )
    // Return structured REST response
    return rest_ensure_response( result )
}
?>

Security checklist for REST

  • Require authentication (cookies or application passwords or OAuth).
  • Check current_user_can in permission_callback.
  • Limit payload size with server settings (e.g., max_input_vars, post_max_size) and code-level checks.
  • Log imports and who triggered them.

WP-CLI command for bulk import

WP-CLI is ideal for large imports because it avoids web server timeouts and has more resources. Create a simple command that reads a file path and runs the importer.

import_from_json_string( json )

        WP_CLI::success( sprintf( Created: %d, Updated: %d, Errors: %d,
            count( result[created] ),
            count( result[updated] ),
            count( result[errors] )
        ) )

        if ( ! empty( result[errors] ) ) {
            WP_CLI::warning( Errors: )
            foreach ( result[errors] as e ) {
                WP_CLI::line( -  . ( is_array( e ) ? json_encode( e ) : (string) e ) )
            }
        }
    } )
}
?>

Using a JSON Schema validator library (opis/json-schema) example

Install the library with Composer. Example validation flow:

composer require opis/json-schema
schemaValidation( json, schema )

if ( ! result->isValid() ) {
    formatter = new ErrorFormatter()
    errors = formatter->format(result->error())
    // Convert or log errors, return to caller
    return errors
}
// proceed with import using decoded array
?>

Notes about including Composer libraries inside WordPress

  • Bundle vendor/ with your plugin or use autoloading via mu-plugins or plugins main file include.
  • Be mindful of autoload conflicts and namespace collisions.

Error handling, logging and reporting

Implement a structured report with entries for created, updated, skipped and errors. Store import runs in the database (custom post type or custom table) if you need an audit trail include:

  • timestamp
  • initiator (user ID or system)
  • original file or payload hash
  • counts and error details

For severe failures provide a rollback button in the admin UI that calls importer->rollback(). Rollback is best-effort and might not revert everything. For deterministic rollback use your own DB tables and transactions.

Performance tips

  • For very large imports, use WP-CLI and batch the work into chunks.
  • Use wp_defer_term_counting(true) and wp_defer_comment_counting(true) around bulk operations to reduce overhead, and reset them afterwards.
  • Temporarily disable object cache invalidation if your persistent cache supports it, or use wp_suspend_cache_invalidation( true ) where available.
  • When many taxonomy terms will be created, pre-create terms in batches instead of creating them inside the loop to reduce repeated DB queries.
  • Use a custom indexed table for external_id lookups if you will run millions of items postmeta lookups scale poorly.

Attachment and file handling best practices

  • Restrict downloaded file types to image mime types you accept (image/jpeg, image/png, image/gif, image/webp).
  • Limit file size and timeouts on downloads.
  • Use media_handle_sideload which integrates with WP media handling and security checks.
  • Run wp_check_filetype on downloaded files before inserting.
  • Store any remote origin URL as meta to trace provenance.

Mapping advanced cases

When your JSON doesnt map 1:1 to WP fields, implement a mapping layer. Example mapping table:

JSON key WP destination
external_id postmeta: _import_external_id
title post_title
content post_content
featured_image attachment -> post_thumbnail
categories taxonomy: category
tags taxonomy: post_tag
meta.key postmeta: key

Internationalization and encoding

  • Ensure JSON is UTF-8 encoded before decoding. Use json_decode(str, true) after validating encoding.
  • Sanitize text fields and use wp_kses_post for HTML content.
  • When storing structured meta, prefer JSON-encoded strings or serialized arrays but be consistent.

Testing and validation checklist

  1. Validate sample JSON with unit tests (PHPUnit) using both valid and invalid payloads.
  2. Test with large files via WP-CLI to find memory/time limits.
  3. Test image downloads for timeouts, large files and invalid MIME types.
  4. Test taxonomy creation collisions and term duplicates.
  5. Test user mapping scenarios (missing user, duplicate mapping).

Example end-to-end scenario

1) Receive JSON payload via REST from a trusted integration.
2) REST permission_callback validates current_user_can(edit_posts).
3) Importer validates top-level schema manually or by using a schema validator. If invalid, return 400 with errors.
4) For each item, find existing post by external_id. If found, update else create new post and store external_id in postmeta.
5) Download featured image with download_url, validate mime and size, insert attachment and set as thumbnail.
6) Set categories/tags with wp_set_object_terms, creating terms as necessary.
7) Update meta with update_post_meta (sanitizing and encoding arrays as JSON).
8) Return a report with counts and errors persist the report to a custom post type import_run for auditing.

Extending and customizing

  • Expose filters for field mapping, e.g. apply_filters(wp_json_importer_map_title, title, item).
  • Add custom validators via hooks, e.g. add_filter(wp_json_importer_validate_item, my_validator, 10, 2).
  • Use asynchronous processing for super-large imports: enqueue tasks with Action Scheduler or a custom queue worker and process items in smaller jobs.

Quick checklist before going live

  • Ensure only authorized users or CLI processes can run the import.
  • Limit payload size, validate JSON and ensure UTF-8 encoding.
  • Whitelist image mime types and maximum sizes.
  • Decide rollback policy and implement a testable rollback flow.
  • Log import runs with who/when/results.
  • Test on a staging copy of production data first.

References and useful links

Final implementation considerations

The code samples in this article are a strong starting point. Adapt the importer to your environment, add stricter validation when required, and invest time into testing imports in staging environments before applying them to production. Use WP-CLI for large imports. Use JSON Schema validators for complex contracts. Track runs for auditing. Limit file downloads and sanitize everything.



Acepto donaciones de BAT's mediante el navegador Brave 🙂



Leave a Reply

Your email address will not be published. Required fields are marked *