<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Xavier Llorà &#187; serialization</title>
	<atom:link href="http://www.xavierllora.net/tag/serialization/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.xavierllora.net</link>
	<description>A notebook on data-intensive computing, genetics-based machine learning &#38; more.</description>
	<lastBuildDate>Sun, 08 Jan 2012 19:39:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Efficient serialization for Java (and beyond)</title>
		<link>http://www.xavierllora.net/2009/02/04/efficient-serialization-for-java-and-beyond/</link>
		<comments>http://www.xavierllora.net/2009/02/04/efficient-serialization-for-java-and-beyond/#comments</comments>
		<pubDate>Wed, 04 Feb 2009 15:10:38 +0000</pubDate>
		<dc:creator>Xavier</dc:creator>
				<category><![CDATA[Data-Intensive Computing]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[c]]></category>
		<category><![CDATA[data-intensive flows]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[meandre]]></category>
		<category><![CDATA[protocol buffers]]></category>
		<category><![CDATA[python]]></category>
		<category><![CDATA[serialization]]></category>
		<category><![CDATA[xstream]]></category>

		<guid isPermaLink="false">http://www.xavierllora.net/?p=431</guid>
		<description><![CDATA[I am currently working on the distributed execution of flows as part of the Meandre infrastructure&#8212;as a part of the SEASR project. One of the pieces to explore is how to push data between machines. No, I am not going to talk about network protocols and the like here, but how you can pass the [...]
Related posts:<ol>
<li><a href='http://www.xavierllora.net/2008/05/20/svnkit-or-analyzing-svn-content-in-java/' rel='bookmark' title='SVNKit or analyzing SVN content in Java'>SVNKit or analyzing SVN content in Java</a></li>
<li><a href='http://www.xavierllora.net/2009/09/29/temporary-storage-for-meandres-distribute-flow-execution/' rel='bookmark' title='Temporary storage for Meandre&#8217;s distributed flow execution'>Temporary storage for Meandre&#8217;s distributed flow execution</a></li>
<li><a href='http://www.xavierllora.net/2006/10/19/r-and-java/' rel='bookmark' title='R and Java'>R and Java</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>I am currently working on the distributed execution of flows as part of the <a href="http://seasr.org/meandre/">Meandre infrastructure</a>&#8212;as a part of the <a href="http://seasr.org">SEASR project</a>. One of the pieces to explore is how to push data between machines. No, I am not going to talk about network protocols and the like here, but how you can pass the data around. If you have ever programmed <a href="http://www-unix.mcs.anl.gov/mpi/">MPI</a> using C/C++ you remember the tedious efforts that requires passing complex data structures around between processes. Serialization is a way to take those complex structures into a form that can be easily stored/transmitted, and then retrieved/received and regenerate the original complex data structure. Some languages/platforms support this functionality (e.g. Java, Python), allowing to easily use the serialized representation for persistency or transmission purposes.</p>
<p>Last Thursday I was talking to <a href="http://vermaabhishekp.googlepages.com/">Abhishek Verma</a>, and he pointed out <a title="Protocol buffers" href="http://code.google.com/p/protobuf/">Google&#8217;s Protol Buffer</a> project&#8212;Google&#8217;s take data interchange formats. Not a new idea&#8212;for instance <a href="http://en.wikipedia.org/wiki/Interface_definition_language">Corba&#8217;s IDL</a> has been around for a long time&#8212;but what caught my eye was their claims about: (1) efficiency, and (2) multiple language bindings. I was contemplating using <a href="http://xstream.codehaus.org/">XStream</a> for Meandre distributed flow execution needs, but the XML heavy weight made me quite reluctant to walk down that path.  The Java native serialization is not a bad choice in terms of efficiency, but does not provide friendly mechanics for modifying data formats without rendering already serialized objects useless, neither a transparent mechanism to allow bindings for other languages/platforms. So the <a title="Protocol buffers" href="http://code.google.com/p/protobuf/">Google&#8217;s Protol Buffer</a> seemed an option worth trying. So there I went, and I prepare a simple comparison between the tree: (1) Java serialization, (2) <a title="Protocol buffers" href="http://code.google.com/p/protobuf/">Google&#8217;s Protol Buffer</a>, and (3) <a href="http://xstream.codehaus.org/">XStream</a>. Yes, you may guess the outcome, but I was more interested on getting my hands dirty, see how <a title="Protocol buffers" href="http://code.google.com/p/protobuf/">Google&#8217;s Protol Buffer</a> perform, and how much overhead for the developer it required.</p>
<h2>The experiment</h2>
<p>Before getting into the description, this experiment does not try to be an exhaustive performance evaluation, just an afternoon diversion. Having said so, the experiment measured the serialization/deserialization time and space used for a simple data structure containing just one array of integers and one array of strings. All the integers were initialized to zero, and the strings to <em>&#8220;Dummy text&#8221;</em>. To allow measuring the time required to serialize this simple object, the number of integers and strings were increased incrementally. The code below illustrates the implementation of the Java native serialization measures.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">package</span> <span style="color: #006699;">org.meandre.tools.serialization.xstream</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> TargetObject <span style="color: #009900;">&#123;</span>
&nbsp;
       <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #003399;">String</span> <span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> sa<span style="color: #339933;">;</span>
       <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">int</span> <span style="color: #009900;">&#91;</span><span style="color: #009900;">&#93;</span> ia<span style="color: #339933;">;</span>
&nbsp;
       <span style="color: #000000; font-weight: bold;">public</span> TargetObject <span style="color: #009900;">&#40;</span> <span style="color: #000066; font-weight: bold;">int</span> iStringElements, <span style="color: #000066; font-weight: bold;">int</span> iIntegerElements <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
             sa <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">String</span><span style="color: #009900;">&#91;</span>iStringElements<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
             <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span> <span style="color: #000066; font-weight: bold;">int</span> i<span style="color: #339933;">=</span><span style="color: #cc66cc;">0</span> <span style="color: #339933;">;</span> i<span style="color: #339933;">&lt;</span>iStringElements <span style="color: #339933;">;</span> i<span style="color: #339933;">++</span> <span style="color: #009900;">&#41;</span>
                  sa<span style="color: #009900;">&#91;</span>i<span style="color: #009900;">&#93;</span> <span style="color: #339933;">=</span> <span style="color: #0000ff;">&quot;Dummy text&quot;</span><span style="color: #339933;">;</span>
             ia <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #000066; font-weight: bold;">int</span><span style="color: #009900;">&#91;</span>iIntegerElements<span style="color: #009900;">&#93;</span><span style="color: #339933;">;</span>
       <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>The experiment consisted on generating objects like the above containing from 100 to 10,000 elements by increments of 100. Each object was serialized 50 times, measuring the average serialization time and the space required (in bytes) per object generated. Below you may have the sample code I used to measure native java serialization/deserialization times.</p>

<div class="wp_syntax"><div class="code"><pre class="java" style="font-family:monospace;"><span style="color: #000000; font-weight: bold;">package</span> <span style="color: #006699;">org.meandre.tools.serialization.java</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.ByteArrayInputStream</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.ByteArrayOutputStream</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.IOException</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.ObjectInputStream</span><span style="color: #339933;">;</span>
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">java.io.ObjectOutputStream</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">import</span> <span style="color: #006699;">org.junit.Test</span><span style="color: #339933;">;</span>
&nbsp;
<span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000000; font-weight: bold;">class</span> JavaSerializationTest <span style="color: #009900;">&#123;</span>
&nbsp;
       @Test
       <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> testJavaSerialization <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> 
       <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">IOException</span> <span style="color: #009900;">&#123;</span>
             <span style="color: #000000; font-weight: bold;">final</span> <span style="color: #000066; font-weight: bold;">int</span> MAX_SIZE <span style="color: #339933;">=</span> <span style="color: #cc66cc;">10000</span><span style="color: #339933;">;</span>
             <span style="color: #000000; font-weight: bold;">final</span> <span style="color: #000066; font-weight: bold;">int</span> REP <span style="color: #339933;">=</span> <span style="color: #cc66cc;">50</span><span style="color: #339933;">;</span>
             <span style="color: #000000; font-weight: bold;">final</span> <span style="color: #000066; font-weight: bold;">int</span> INC <span style="color: #339933;">=</span> <span style="color: #cc66cc;">100</span><span style="color: #339933;">;</span>
&nbsp;
             <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Java serialization times&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
             <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span> <span style="color: #000066; font-weight: bold;">int</span> i<span style="color: #339933;">=</span>INC <span style="color: #339933;">;</span> i<span style="color: #339933;">&lt;=</span>MAX_SIZE <span style="color: #339933;">;</span> i<span style="color: #339933;">+=</span>INC <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
                  TargetObjectSerializable tos <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> TargetObjectSerializable<span style="color: #009900;">&#40;</span>i,i<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                  <span style="color: #000066; font-weight: bold;">long</span> lAccTime <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span>
                  <span style="color: #000066; font-weight: bold;">long</span> lSize <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span>
                  <span style="color: #000066; font-weight: bold;">long</span> lTmp<span style="color: #339933;">;</span>
                  <span style="color: #003399;">ByteArrayOutputStream</span> baos<span style="color: #339933;">;</span>
                  <span style="color: #003399;">ObjectOutputStream</span> out<span style="color: #339933;">;</span>
                  <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span> <span style="color: #000066; font-weight: bold;">int</span> j<span style="color: #339933;">=</span><span style="color: #cc66cc;">0</span> <span style="color: #339933;">;</span> j<span style="color: #339933;">&lt;</span>REP <span style="color: #339933;">;</span> j<span style="color: #339933;">++</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
                      baos <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">ByteArrayOutputStream</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                      out <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">ObjectOutputStream</span><span style="color: #009900;">&#40;</span>baos<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                      lTmp <span style="color: #339933;">=</span> <span style="color: #003399;">System</span>.<span style="color: #006633;">currentTimeMillis</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                      out.<span style="color: #006633;">writeObject</span><span style="color: #009900;">&#40;</span>tos<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                      lTmp <span style="color: #339933;">-=</span> <span style="color: #003399;">System</span>.<span style="color: #006633;">currentTimeMillis</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                      out.<span style="color: #006633;">close</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                      lAccTime <span style="color: #339933;">-=</span> lTmp<span style="color: #339933;">;</span>
                      lSize <span style="color: #339933;">=</span> baos.<span style="color: #006633;">size</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span> 
                  <span style="color: #009900;">&#125;</span>
                  <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">+</span>i<span style="color: #339933;">+</span><span style="color: #0000ff;">&quot;<span style="color: #000099; font-weight: bold;">\t</span>&quot;</span><span style="color: #339933;">+</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">double</span><span style="color: #009900;">&#41;</span>lAccTime<span style="color: #009900;">&#41;</span><span style="color: #339933;">/</span>REP<span style="color: #009900;">&#41;</span><span style="color: #339933;">+</span><span style="color: #0000ff;">&quot;<span style="color: #000099; font-weight: bold;">\t</span>&quot;</span><span style="color: #339933;">+</span>lSize<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
             <span style="color: #009900;">&#125;</span>
       <span style="color: #009900;">&#125;</span>
&nbsp;
&nbsp;
       @Test
       <span style="color: #000000; font-weight: bold;">public</span> <span style="color: #000066; font-weight: bold;">void</span> testJavaDeserialization <span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span> 
       <span style="color: #000000; font-weight: bold;">throws</span> <span style="color: #003399;">IOException</span>, <span style="color: #003399;">ClassNotFoundException</span> <span style="color: #009900;">&#123;</span>
             <span style="color: #000000; font-weight: bold;">final</span> <span style="color: #000066; font-weight: bold;">int</span> MAX_SIZE <span style="color: #339933;">=</span> <span style="color: #cc66cc;">10000</span><span style="color: #339933;">;</span>
             <span style="color: #000000; font-weight: bold;">final</span> <span style="color: #000066; font-weight: bold;">int</span> REP <span style="color: #339933;">=</span> <span style="color: #cc66cc;">50</span><span style="color: #339933;">;</span>
             <span style="color: #000000; font-weight: bold;">final</span> <span style="color: #000066; font-weight: bold;">int</span> INC <span style="color: #339933;">=</span> <span style="color: #cc66cc;">100</span><span style="color: #339933;">;</span>
&nbsp;
             <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;Java deserialization times&quot;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
             <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span> <span style="color: #000066; font-weight: bold;">int</span> i<span style="color: #339933;">=</span>INC <span style="color: #339933;">;</span> i<span style="color: #339933;">&lt;=</span>MAX_SIZE <span style="color: #339933;">;</span> i<span style="color: #339933;">+=</span>INC <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
                  TargetObjectSerializable tos <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> TargetObjectSerializable<span style="color: #009900;">&#40;</span>i,i<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                  <span style="color: #003399;">ByteArrayOutputStream</span> baos <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">ByteArrayOutputStream</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                  <span style="color: #003399;">ObjectOutputStream</span> out <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">ObjectOutputStream</span><span style="color: #009900;">&#40;</span>baos<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                  out.<span style="color: #006633;">writeObject</span><span style="color: #009900;">&#40;</span>tos<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                  out.<span style="color: #006633;">close</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                  <span style="color: #003399;">ByteArrayInputStream</span> bais<span style="color: #339933;">;</span>
                  <span style="color: #003399;">ObjectInputStream</span> ois<span style="color: #339933;">;</span>
                  <span style="color: #000066; font-weight: bold;">long</span> lAccTime <span style="color: #339933;">=</span> <span style="color: #cc66cc;">0</span><span style="color: #339933;">;</span>
                  <span style="color: #000066; font-weight: bold;">long</span> lTmp<span style="color: #339933;">;</span>
                  <span style="color: #000000; font-weight: bold;">for</span> <span style="color: #009900;">&#40;</span> <span style="color: #000066; font-weight: bold;">int</span> j<span style="color: #339933;">=</span><span style="color: #cc66cc;">0</span> <span style="color: #339933;">;</span> j<span style="color: #339933;">&lt;</span>REP <span style="color: #339933;">;</span> j<span style="color: #339933;">++</span> <span style="color: #009900;">&#41;</span> <span style="color: #009900;">&#123;</span>
                      bais <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">ByteArrayInputStream</span><span style="color: #009900;">&#40;</span>baos.<span style="color: #006633;">toByteArray</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                      ois <span style="color: #339933;">=</span> <span style="color: #000000; font-weight: bold;">new</span> <span style="color: #003399;">ObjectInputStream</span><span style="color: #009900;">&#40;</span>bais<span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                      lTmp <span style="color: #339933;">=</span> <span style="color: #003399;">System</span>.<span style="color: #006633;">currentTimeMillis</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                      ois.<span style="color: #006633;">readObject</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                      lTmp <span style="color: #339933;">-=</span> <span style="color: #003399;">System</span>.<span style="color: #006633;">currentTimeMillis</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
                      lAccTime <span style="color: #339933;">-=</span> lTmp<span style="color: #339933;">;</span>
                  <span style="color: #009900;">&#125;</span>
                  <span style="color: #003399;">System</span>.<span style="color: #006633;">out</span>.<span style="color: #006633;">println</span><span style="color: #009900;">&#40;</span><span style="color: #0000ff;">&quot;&quot;</span><span style="color: #339933;">+</span>i<span style="color: #339933;">+</span><span style="color: #0000ff;">&quot;<span style="color: #000099; font-weight: bold;">\t</span>&quot;</span><span style="color: #339933;">+</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #009900;">&#40;</span><span style="color: #000066; font-weight: bold;">double</span><span style="color: #009900;">&#41;</span>lAccTime<span style="color: #009900;">&#41;</span><span style="color: #339933;">/</span>REP<span style="color: #009900;">&#41;</span><span style="color: #009900;">&#41;</span><span style="color: #339933;">;</span>
             <span style="color: #009900;">&#125;</span>
       <span style="color: #009900;">&#125;</span>
<span style="color: #009900;">&#125;</span></pre></div></div>

<p>Equivalent versions of the code shown above were used to measure <a title="Protocol buffers" href="http://code.google.com/p/protobuf/">Google&#8217;s Protol Buffer</a> and <a href="http://xstream.codehaus.org/">XStream</a>. If you are interested on seeing the <a href="http://www.xavierllora.net/wp-content/uploads/2009/02/src-test-serialization.zip">full code you can download it as it is</a>&#8212;no guarantees provided. Also, for completion of the experiment code, you can find below the <code>proto</code> file use for testing the Java implementation of <a title="Protocol buffers" href="http://code.google.com/p/protobuf/">Google&#8217;s Protol Buffer</a>.</p>

<div class="wp_syntax"><div class="code"><pre class="proto" style="font-family:monospace;">package test;
&nbsp;
option java_package = &quot;org.meandre.tools.serialization.proto&quot;;
option java_outer_classname = &quot;TargetObjectProtoOuter&quot;;
&nbsp;
message TargetObjectProto { 
  repeated int32 ia = 1; 
  repeated string sa = 2;
}</pre></div></div>

<p>In order to run the experiment, besides <a title="Protocol buffers" href="http://code.google.com/p/protobuf/">Google&#8217;s Protol Buffer</a> and  <a href="http://xstream.codehaus.org/">XStream</a> libraries, you will also need <a href="http://www.junit.org/" title="JUnit">JUnit</a>.</p>
<h2>The results</h2>
<p>The experiments were run on an first generation MacBook Pro using Apple&#8217;s Java 1.5 virtual machine with 2Gb of RAM. The figure below illustrated the different memory requirements for each of the the three serialization methods compared. Figures and data processing was done using <a href="http://www.r-project.org/" title="R">R</a>.</p>
<p><a href="http://www.xavierllora.net/wp-content/uploads/2009/02/data-size.png"><img src="http://www.xavierllora.net/wp-content/uploads/2009/02/data-size.png" alt="Data size of the serialized object" title="Data size of the serialized object" width="220" height="220" class="aligncenter size-full wp-image-425" /></a><a href="http://www.xavierllora.net/wp-content/uploads/2009/02/data-size-ratio.png"><img src="http://www.xavierllora.net/wp-content/uploads/2009/02/data-size-ratio.png" alt="Serialized/original data size ratio" title="Serialized/original data size ratio" width="220" height="220" class="aligncenter size-full wp-image-424" /></a></p>
<p>Figures show the already intuited bloated size of XML-based XStream serialization, up to 6 time larger than the original data being serialized. On the other hand, the Java native serialization provides a minimal increase on the serialized equivalent. Google&#8217;s Protocol Buffer presents a slightly larger requirement than the native Java serialization, but never doubled the original size. Moreover, it does not exhibit the constant initial payload overhead displayed by both XStream and the Java native serialization. The next question was how costly was the serialization process. Figures below show the amount of time required to serialize an object.</p>
<p><a href="http://www.xavierllora.net/wp-content/uploads/2009/02/serialization-time.png"><img src="http://www.xavierllora.net/wp-content/uploads/2009/02/serialization-time.png" alt="Serialization time" title="Serialization time" width="220" height="220" class="aligncenter size-full wp-image-430" /></a><a href="http://www.xavierllora.net/wp-content/uploads/2009/02/serialization-time-ratio.png"><img src="http://www.xavierllora.net/wp-content/uploads/2009/02/serialization-time-ratio.png" alt="Serialization time ratio" title="Serialization time ratio" width="220" height="220" class="aligncenter size-full wp-image-429" /></a></p>
<p>The Java native serialization was, as expected the fastest, however Google&#8217;s Protocol Buffer took only, on average, four times the more time than the Java native version. However, that is peanuts when compared to the fifty times slower XStream version. Deserialization times of the encoded object presents the same trends as the serialization, as the figures below show.</p>
<p><a href="http://www.xavierllora.net/wp-content/uploads/2009/02/deserialization-time.png"><img src="http://www.xavierllora.net/wp-content/uploads/2009/02/deserialization-time.png" alt="Deserialization time" title="Deserialization time" width="220" height="220" class="aligncenter size-full wp-image-427" /></a><a href="http://www.xavierllora.net/wp-content/uploads/2009/02/deserialization-time-ratio.png"><img src="http://www.xavierllora.net/wp-content/uploads/2009/02/deserialization-time-ratio.png" alt="Deserialization time ratio" title="Deserialization time ratio" width="220" height="220" class="aligncenter size-full wp-image-426" /></a></p>
<p>It is also interesting to note that serialization&#8212;as the figures below show&#8212;is faster than deserialization (as common sense would have suggested). However, it is interesting to note that Google&#8217;s Protocol Buffer is the method where these difference is more pronounced.</p>
<p><a href="http://www.xavierllora.net/wp-content/uploads/2009/02/serialization-deserialization-ratio.png"><img src="http://www.xavierllora.net/wp-content/uploads/2009/02/serialization-deserialization-ratio.png" alt="Serialization/deserialization ratio" title="Serialization/deserialization ratio" width="220" height="220" class="aligncenter size-full wp-image-428" /></a></p>
<h2>The lessons learned</h2>
<p>As I said, this is far from being an exhaustive or even representative example, but just one afternoon exploration. However, the results show interesting trends. Yes, XStream could also be tweaked to make the searialized XML leaner, and even would&#8212;with the proper tinkering&#8212;make possible deserialize the object on a different platform/language, but at an enormous cost&#8212;both in size and time. The Java native serialization is by far the fastest and the most size efficient, but is made from and for Java. Also, changes on the serialized classes&#8212;imagine wanting to add or remove a field&#8212;may render the serialize objects unreadable. Google Protocol Buffers on the other hand delivers the best of both scenarios: (1) the ability to serialize/deserialize objects in a compact and relatively fast manner, and (2) allows the serialization/deserialization to happen between different languages and platforms. For these reasons, it seems to be a very interesting option to keep exploring, if you need both.</p>
<p>Related posts:<ol>
<li><a href='http://www.xavierllora.net/2008/05/20/svnkit-or-analyzing-svn-content-in-java/' rel='bookmark' title='SVNKit or analyzing SVN content in Java'>SVNKit or analyzing SVN content in Java</a></li>
<li><a href='http://www.xavierllora.net/2009/09/29/temporary-storage-for-meandres-distribute-flow-execution/' rel='bookmark' title='Temporary storage for Meandre&#8217;s distributed flow execution'>Temporary storage for Meandre&#8217;s distributed flow execution</a></li>
<li><a href='http://www.xavierllora.net/2006/10/19/r-and-java/' rel='bookmark' title='R and Java'>R and Java</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.xavierllora.net/2009/02/04/efficient-serialization-for-java-and-beyond/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

