Skip to content

Serializable interface, "C" format and existing data #67

@amulet1

Description

@amulet1

We currently have at least 47 classes that directly implement Serializable interface. Additional, a few classes extend it (probably not really correct) instead of implementing. And, of course, there are classes that extend those classes.

The data stored using Serializable inteface is in "C" format: C:[class name length]:"ClassName":[data length]:{[data]}.
The "O" format is for newer __serializable() magic method: O:[class name length]:"ClassName":[count]:[serialized data].

The interface is deprecated, and we have to do something about it.

PHP Version "C" Format Support "O" Format Support
7.4 - 8.0 Fully Supported Introduced __serialize
8.1 - 8.4 Supported (Deprecation warning if magic methods are missing) Preferred
9.0+ REMOVED (Rejected by unserialize()) Standard

The issue is with supporting the "C" format. If we drop the Serializable interface, old data will not be decoded at all.

Format Indicator Requires Serializable? Used by __serialize?
Class C: Yes No
Object O: No Yes

For the time being we have to implement __serialize() and __unserialize() if these are missing, but still keep Serializable (dropping it would cause issues with decoding of existing old format data). This will help with "lazy' data migration for the time being (until PHP 9.0+). However, it does not guarantee that all existing old data will be converted or expire on their own.

Additionally, at least some of implementations do json_encode/json_decode (any maybe something else).

A good example is in horde/activesync, but there are many more.

Not only those implementations are not compatible with __serialize()/__unserialize() because of "C" format, but also the json encoded old data would not be decoded by unserialize() prior to call __unserialize().

Additional issue is that existing code does not always have guards or checks returned value false when unserialize() is called. This would obviously cause multiple issues if we simply switch to magic methods (old "C" data will not be decoded, possibly false will be used as retrieved data or exception will be thrown).

Possible additional steps are needed:

  1. Locate and convert (or purge, if not needed) old data. Could be done in some simple cases, but hard to do an all at once data migration and guarantee that the code is also updated all at the same time. And it is nearly impossible in case of highly customized setups.
  2. Locate affected places, add guards/checks to auto-convert or ignore data in old format. Possible redesign to avoid serialize()/unserialize() calls on class object altogether. A good solution in a long term, but maybe a tedious task.
  3. Implement a polyfill for Serializable interface if we still have old data and also need to support PHP 9.0.
  4. Implement our own version of unserialize() (e.g. we can call it unserializeC(), which would serve as a polyfill forunserialize() calls we could not easily resolve with (1) and (2) while we are transitioning. It could also do something custom based on specific class names or their methods (e.g. auto-convert from JSON, then call __unserialize()).

(3) and/or (4) would be needed for universal "C" format support and should be doable with help of ReflectionClass in php (or maybe with on the fly conversion to "O" format in some Horde-specific cases).

(4) maybe a preferred way going forward as it would allow to drop Serlializable interface and modernize the code now, but still transparently support loading of old format data for as long as we want.

Any thoughts?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions