Serialization Problems and Solutions

Posted on April 8, 2007

.NET serialization is a very simple concept, yet often fails to work the way we’d expect. The basic idea is that you’re converting objects from their in-memory format to some other form, and then back again.

Today I’ll walk through some common scenarios that you might encounter when serializing objects to a file with the BinaryFormatter. The complete set of code examples is provided for download here.

The Serializable Attribute

In order to enable serialization, you must annotate the class to be serialized with the Serializable attribute:

[Serializable]
public class CustomerSerializable
{
    private string id;
    private string company;
    ...

To convert an object from it’s in-memory form and write it to a file, we use a BinaryFormatter:

BinaryFormatter formatter = new BinaryFormatter();

CustomerSerializable customer1 = new CustomerSerializable();
customer1.City = "Bothell";
customer1.Region = "WA";

using (FileStream stream = File.Create("Serialized.dat"))
{
    formatter.Serialize(stream, customer1);
}

Once we have persisted an object to a file using the above code, we can reverse the process by reading the object from the file and rehydrating it in memory:

BinaryFormatter formatter = new BinaryFormatter();
CustomerSerializable customer2 = null;
using (FileStream stream = File.OpenRead("Serialized.dat"))
{
    customer2 = (CustomerSerializable)formatter.Deserialize(stream);
}

Pretty simple, huh? Of course there are complications in the real world, as we shall see.

How Does it Work?

Essentially, in the simple case shown above, the formatter finds all of the fields of your class using Reflection, and reads their values during serialization. During deserialization, the data is read from the file and restored to a new empty object. Interestingly, in the simple case, no constructor of your serialized class is called during deserialization.

ISerializable

You might have heard about the ISerializable interface. Isn’t implementing that just another way of saying that my class is serializable? Not quite. You still have to apply the Serializable attribute to the class. ISerializable just allows us to serialize and deserialize the data in a customized way. Usually, this means saving only some of the data, or transforming the data before saving it. Again, you must still apply the Serializable attribute to your class. If you don’t, BinaryFormatter.Serialize() will throw a SerializationException, even though your class supports ISerializable.

Unserializable Fields and Bases

Adding the Serializable attribute to your class is all well and good if the base class (and it’s base, etc) and all of the fields of your class are also serializable. But what if they aren’t? In either case, the Serializable attribute isn’t enough – this is where ISerializable is needed. Let’s suppose the base class isn’t serializable:

public class BaseNotSerializable
{
    private int totalOrders;

    public int OrderTotal
    {
        get { return totalOrders; }
        set { totalOrders = value; }
    }
}

[Serializable]
public class CustomerBaseNotSerializableISerializable : BaseNotSerializable, ISerializable
{
    private string id;
    private string company;
    ...

By declaring support for ISerializable, we’re telling BinaryFormatter to do two things. First, when serializing, our GetObjectData method should be called instead of using the Reflection-based mechanism for reading our object’s data. Secondly, during deserialization a special constructor should be called instead of rehydrating the object in the normal way. In each case, we’ll be provided with a SerializationInfo object that contains the data being transferred to or from the object. Our job in GetObjectData() is to save the data from our object and from the unserializable base of our object:

void ISerializable.GetObjectData(SerializationInfo info,
StreamingContext context)
{
    // Serialize data from the base.
    info.AddValue("OrderTotal", this.OrderTotal);

    info.AddValue("Address", this.address);
    info.AddValue("City", this.city);
    info.AddValue("Company", this.company);
    info.AddValue("ContactName", this.contactName);
    info.AddValue("ContactTitle", this.contactTitle);
    info.AddValue("Country", this.country);
    info.AddValue("Fax", this.fax);
    info.AddValue("Id", this.id);
    info.AddValue("Phone", this.phone);
    info.AddValue("Region", this.region);
    info.AddValue("ZipCode", this.zipcode);
}

If our class contained a field that wasn’t serializable, we could just skip it in GetObjectData(), or save some other piece of serializable information that we could use later to restore the value.
During deserialization, the special constructor is called, allowing us to read the data back into our object:

public CustomerBaseNotSerializableISerializable(SerializationInfo info,
StreamingContext context)
{
    // Deserialize data to the base.
    this.OrderTotal = info.GetInt32("OrderTotal");

    this.address = info.GetString("Address");
    this.city = info.GetString("City");
    this.company = info.GetString("Company");
    this.contactName = info.GetString("ContactName");
    this.contactTitle = info.GetString("ContactTitle");
    this.country = info.GetString("Country");
    this.fax = info.GetString("Fax");
    this.id = info.GetString("Id");
    this.phone = info.GetString("Phone");
    this.region = info.GetString("Region");
    this.zipcode = info.GetString("ZipCode");
}

Something to note about this constructor is that it is required when ISerializable is implemented, but it is not part of the ISerializable interface, so it is easy to accidentally omit. The compiler won’t complain, because you have met the ISerializable implementation requirements by providing GetObjectData(). A SerializationException will be thrown at run-time if the constructor is not present. As I mentioned earlier, no constructor is called in the simple case where ISerializable is not implemented.

But it’s not my class!

Implementing ISerializable is a great solution when you’re writing your own class. Sometimes, however, you need to serialize an object provided by a third-party library or the .NET framework itself. If the class wasn’t implemented with the Serializable attribute, you might think that you’re stuck. Fortunately, there’s a way around this obstacle, too. Enter the serialization surrogate.

A serialization surrogate is a class that implements ISerializationSurrogate, and knows how to serialize and deserialize some other class. We inject this knowledge into the serialization process by attaching the surrogate to the SurrogateSelector of the BinaryFormatter:

BinaryFormatter formatter = new BinaryFormatter();

SurrogateSelector selector = new SurrogateSelector();
selector.AddSurrogate(typeof(CustomerNotSerializable),
                    new StreamingContext(StreamingContextStates.All),
                    new CustomerNotSerializableSerializationSurrogate());

formatter.SurrogateSelector = selector;

CustomerNotSerializable customer1 = new CustomerNotSerializable();

using (FileStream stream = File.Create("Serialized.dat"))
{
    formatter.Serialize(stream, customer1);
}

CustomerNotSerializable customer2 = null;
using (FileStream stream = File.OpenRead("Serialized.dat"))
{
    customer2 = (CustomerNotSerializable)formatter.Deserialize(stream);
}

Now when formatter.Serialize() and formatter.Deserialize() are called, the SurrogateSelector is consulted, and methods of the provided CustomerNotSerializableSerializationSurrogate class are called to perform the needed operations:

public class CustomerNotSerializableSerializationSurrogate : ISerializationSurrogate
{
    // GetObjectData is called to serialize the object.
    void ISerializationSurrogate.GetObjectData(object obj,
                                               SerializationInfo info,
                                               StreamingContext context)
    {
        CustomerNotSerializable customer = (CustomerNotSerializable)obj;

        info.AddValue("Address", customer.Address);
        info.AddValue("City", customer.City);
        info.AddValue("Company", customer.Company);
        info.AddValue("ContactName", customer.ContactName);
        info.AddValue("ContactTitle", customer.ContactTitle);
        info.AddValue("Country", customer.Country);
        info.AddValue("Fax", customer.Fax);
        info.AddValue("Id", customer.Id);
        info.AddValue("Phone", customer.Phone);
        info.AddValue("Region", customer.Region);
        info.AddValue("ZipCode", customer.ZipCode);
    }

    // SetObjectData is called to deserialize the object.
    object ISerializationSurrogate.SetObjectData(object obj,
                                         SerializationInfo info,
                                         StreamingContext context,
                                         ISurrogateSelector selector)
    {
        CustomerNotSerializable customer = (CustomerNotSerializable)obj;

        customer.Address = info.GetString("Address");
        customer.City = info.GetString("City");
        customer.Company = info.GetString("Company");
        customer.ContactName = info.GetString("ContactName");
        customer.ContactTitle = info.GetString("ContactTitle");
        customer.Country = info.GetString("Country");
        customer.Fax = info.GetString("Fax");
        customer.Id = info.GetString("Id");
        customer.Phone = info.GetString("Phone");
        customer.Region = info.GetString("Region");
        customer.ZipCode = info.GetString("ZipCode");

        return customer;
    }
}

You’ve probably noticed some similarity between the surrogate class and an implementation of ISerializable. It’s essentially the same thing, with the surrogate being decoupled from the class being serialized.

The Evil Conspiracy of Event Handlers

.NET events are a great architectural feature, but they can pose unexpected problems in serialization. When you’re serializing an object that contains an event, you potentially could be serializing objects of any .NET class. How could this possibly be, you ask?
Events are sneaky. Very sneaky. Think about it for a minute. An event is just a delegate field with some sugary syntax to make it more convenient. A delegate is a list of zero or more methods to call when the event is raised. Each method in the delegate’s list references a specific object (if it is an instance method). If any event handlers have been added to your event, these objects will be serialized when you serialize the object containing the event!
Typically, you just don’t want to serialize the delegate field of an event. You might also have other fields in your class that aren’t serializable or that you just don’t need to persist. For these cases, the NonSerialized attribute can be applied to the field, and the serialization mechanism will ignore the field during serialization and deserialization. Consider these fields of a class:

private string id;
private string company;
private string contactName;

[NonSerialized]
private PropertyDescriptor lastPropertyChanged;

[NonSerialized]
private PropertyDescriptorCollection properties;

[NonSerialized]
private PropertyChangedEventHandler propChanged;

public event PropertyChangedEventHandler PropertyChanged
{
    add { propChanged =
              (PropertyChangedEventHandler)Delegate.Combine(propChanged,
              value); }
    remove { propChanged =
              (PropertyChangedEventHandler)Delegate.Remove(propChanged,
              value); }
}

Here we have a couple of fields that aren’t serializable; neither PropertyDescriptor nor PropertyDescriptorCollection can be serialized. I’ve annotated these with [NonSerialized] so that the formatter won’t attempt to serialize or deserialize them. Making the PropertyChanged event non-serialized requires the event to be broken up into an declared delegate field and an event with explicit add and remove accessors. The delegate field (propChanged above) is then annotated with [NonSerialized]. If I had declared the event in the conventional way, I couldn’t have applied the NonSerialized attribute to the event. The layout shown above, with an explicit delegate field (propChanged) and event accessors acting on the delegate mirrors exactly the structure that you would get by specifying the event conventionally:

public event PropertyChangedEventHandler PropertyChanged;

The code above is perfect for ignoring certain fields during serialization and deserialization. However, it may be the case that after deserialization, these fields really do need valid values. There are a few ways to handle this. We could implement ISerializable and restore the values in the special deserializing constructor. Alternatively, we could provide a surrogate and restore the values in the surrogate’s SetObjectData() method.

In the example above, lastPropertyChanged represents the last property that was changed. Even though PropertyDescriptor is not serializable, I can save a memento of the value in ISerializable.GetObjectData(), and then restore the value from the memento in the deserialization constructor:

// Called during serialization (because support for ISerializable
// is declared).
void ISerializable.GetObjectData(SerializationInfo info,
                                 StreamingContext context)
{
    info.AddValue("Address", this.address);
    info.AddValue("City", this.city);
    info.AddValue("Company", this.company);
    ...

    if (lastPropertyChanged != null)
        info.AddValue("LastPropertyChanged", lastPropertyChanged.Name);
}

// Called during deserialization (because support for ISerializable
// is declared).
public CustomerSerializableFieldsNonSerialized(SerializationInfo info,
                                               StreamingContext context)
{
    this.address = info.GetString("Address");
    this.city = info.GetString("City");
    this.company = info.GetString("Company");
    ...

    try
    {
        string propertyName = info.GetString("LastPropertyChanged");
        if (propertyName != null)
            lastPropertyChanged =
                TypeDescriptor.GetProperties(this)[propertyName];
    }
    catch (SerializationException)
    {
        // LastPropertyChanged was not added when the object
        // was serialized (see GetObjectData).
    }
}

The key here is that even though lastPropertyChanged is not serializable, the memento that I save instead (lastPropertyChanged.Name) is serializable.

IDeserializationCallback

Another, simpler way of restoring non-serialized data, is to implement IDeserializationCallback on the class being serialized:

// Called after deserialization (because support for
// IDeserializationCallback is declared).
void IDeserializationCallback.OnDeserialization(object sender)
{
properties = TypeDescriptor.GetProperties(this);
}

There’s just one method to be provided, as shown. It is called after deserialization is complete. In this example, the properties field is a collection of PropertyDescriptors for the type being serialized. This is the kind of data that is a perfect candidate for IDeserializationCallback – something that can be restored without specific knowledge of the object, or that has a reasonable default value.

Whew.

Serialization can be a little complicated. It’s easier if you just remember a few things:

    • The Serializable class attribute must always be applied OR a serialization surrogate provided.
    • Unserializable bases or fields can be serialized with an ISerializable implementation OR by a serialization surrogate.
    • Fields and events can be omitted from serialization with the NonSerialized attribute.
    • Non-serialized data can be restored in the ISerializable deserialization constructor OR in the SetObject() method of a serialization surrogate OR in IDeserializationCallback.OnDeserialization().

And that’s all I have to say about that. Download the sample code if you’d like to explore more.


No Replies to "Serialization Problems and Solutions"


    Got something to say?

    Some html is OK