A big part of the data I work with needs to be serialized and cross platform, ha...

crpatino · on July 8, 2015

Sounds like you are mixing up you data's in-memory representation with their storage/transmission representation. This is risky business.

If you have no requirement that says otherwise, you should have an explicit marshalling and demarshalling steps that transform your live data objects into opaque BLObs. It would be highly desirable if your BLObs have some header that contains metadata to be used exclusively for marshalling purposes, at the very least size of the payload, object type id and format version id will save you lots of trouble.

Now what happens if you need high performance and are willing to trade of code complexity for faster execution. You can just copy your native object's bytes into the BLOB payload, just as long as you can correctly identify the source platform's relevant characteristics in the header. Then when the target host does the demarshalling step, it can decide if the native format is compatible with it's own platform and just copy the payload into a zeroed buffer of the correct size. If that its not the case, it will have to perform and extra deferred marshalling step to put the payload in "canonical" format prior to demarshalling proper.

You can even make the behavior configurable, so that customers running an heterogeneous environment do not suffer a performance hit for the sake of the customers in homogeneous environments.

chetanahuja · on July 8, 2015

Of course the data in storage or over the wire needs to be marshalled and unmarshalled (whether explicitly standardizing on a particular wire format or with header based hacks or whatnot). That's not the point.

The point is that a lot of the times, the two machines on either end of the wire need to agree on sizes of various fields you're sending (say in protocol headers). And then you want to work with that data internally in the code on either side. You better be absolutely sure how many bits you have in each type that you're allocating for these purposes.

And going even beyond that, very common, use case -- a lot of code reads cleaner and lends itself to debuggability when you know the exact sizes of the types you're using. It's not something reserved for just network programming.

crpatino · on July 8, 2015

Sorry, I fail to see the point in your second paragraph. Of course in the business logic level you need to allocate variables that can hold every possible value in the valid range, but as long as this is the case, why does it matter that you use types that have the same byte size in every possible platform?

In your third paragraph, i agree on the debuggability front (if you are actually reading memory dumps, otherwise, why should it matter). About the code reading clearer, I guess this is more a matter of taste.

chetanahuja · on July 13, 2015

It matters because of code readability, debuggability and all sorts of code hygiene reasons. If I'm using size_t for a field in my protocol on a 32 bit platform on one end and 64 bit platform on the other, which size wins over the wire? Can that question be answered while in debugging flow trying to track down a memory stomping error?