phpserialize-rs

Crates.io PyPI License CI

High-performance PHP serialize/unserialize parser written in Rust with Python bindings.

Features

Installation

Python

pip install phpserialize-rs

Rust

[dependencies]
php-deserialize-core = "0.1"

Quick Start

Python

from php_deserialize import loads, loads_json

# Basic usage
data = b'a:2:{s:4:"name";s:5:"Alice";s:3:"age";i:30;}'
result = loads(data)
print(result)  # {'name': 'Alice', 'age': 30}

# Direct JSON conversion (optimized for Databricks)
json_str = loads_json(data)
print(json_str)  # {"name":"Alice","age":30}

# Handle DB-escaped strings automatically
escaped = b'"a:1:{s:4:""key"";s:5:""value"";}"'
result = loads(escaped)  # Auto-unescapes
print(result)  # {'key': 'value'}

# Auto-fallback for encoding mismatches (no option needed!)
# Handles data serialized with EUC-KR but stored as UTF-8
mismatch = b's:4:"\xed\x95\x9c\xea\xb8\x80";'  # "한글" with wrong length
result = loads(mismatch)  # Automatically recovers
print(result)  # '한글'

# Strict mode (disable auto-fallback)
result = loads(data, strict=True)  # Fails on length mismatch

# Error handling options
result = loads(data, errors="replace")  # Replace invalid UTF-8
result = loads(data, errors="bytes")    # Return bytes for invalid UTF-8

PySpark / Databricks

from php_deserialize.spark import php_to_json
from pyspark.sql.functions import get_json_object

# Convert PHP serialize to JSON (Arrow-optimized UDF)
df = spark.table("bronze.my_table")
df = df.withColumn("data_json", php_to_json("serialized_column"))

# Extract fields from JSON
df = df.withColumn("name", get_json_object("data_json", "$.name"))
df = df.withColumn("age", get_json_object("data_json", "$.age"))

df.display()

For Databricks installation:

%pip install phpserialize-rs

Rust

use php_deserialize_core::{from_bytes, PhpValue};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let data = br#"a:2:{s:4:"name";s:5:"Alice";s:3:"age";i:30;}"#;
    let value = from_bytes(data)?;

    if let PhpValue::Array(items) = value {
        for (key, val) in items {
            println!("{:?} => {:?}", key, val);
        }
    }

    Ok(())
}

Supported PHP Types

Type PHP Format Example
Null N; N;
Boolean b:0; / b:1; b:1;
Integer i:<value>; i:42;
Float d:<value>; d:3.14;
String s:<len>:"<data>"; s:5:"hello";
Array a:<count>:{...} a:1:{i:0;s:3:"foo";}
Object O:<len>:"<class>":<count>:{...} Object with properties
Reference R:<index>; / r:<index>; Circular references
Enum (PHP 8.1+) E:<len>:"<Class:Case>"; E:10:"Status:Active";

Performance

Benchmarked on Apple M1 Pro:

Operation Throughput
Simple array ~1.5 GB/s
Nested structure ~800 MB/s
Large string ~2.0 GB/s

Compared to php2json (Python):

Error Handling

The library provides detailed error messages for debugging:

from php_deserialize import loads, PhpDeserializeError

try:
    loads(b"invalid data")
except PhpDeserializeError as e:
    print(f"Parse error at position {e.position}: {e.message}")

DB Escape Handling

When data is exported from databases (MySQL, PostgreSQL), strings may be double-quoted and escaped:

Original: a:1:{s:4:"key";s:5:"value";}
DB Export: "a:1:{s:4:""key"";s:5:""value"";}"

The library automatically detects and handles this format:

# Both work identically
loads(b'a:1:{s:4:"key";s:5:"value";}')
loads(b'"a:1:{s:4:""key"";s:5:""value"";}"')

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

Licensed under either of:

at your option.