Skip to main content

Endianness: Big- vs Little-Endian and Serialization

What This Concept Is

Endianness is the order in which the bytes of a multi-byte integer are stored in memory.

  • Little-endian (x86, x86-64, most ARM): lowest-address byte is the least significant. The 32-bit value 0x12345678 is stored as 78 56 34 12.
  • Big-endian (network byte order, older PowerPC, some embedded): lowest-address byte is the most significant. 0x12345678 is stored as 12 34 56 78.

Endianness only affects byte ordering within a single multi-byte word. The bits within each byte are the same on every machine.

Why It Matters Here

Endianness bites you whenever bytes leave the machine:

  • network protocols (TCP, IP, DNS) are defined in network byte order, which is big-endian
  • file formats often fix one endianness so files port across machines
  • debugging hex dumps: knowing the machine's endianness is how you read a uint32_t
  • SIMD lane ordering and CPU feature flags usually follow the machine's native endianness

Two programs sharing memory or a network stream must agree on endianness, or every multi-byte field is silently scrambled.

Concrete Example

On a little-endian machine:

uint32_t x = 0x12345678;
unsigned char *p = (unsigned char *)&x;
/* p[0]=0x78, p[1]=0x56, p[2]=0x34, p[3]=0x12 */

On a big-endian machine, the same code gives p[0]=0x12, p[1]=0x34, p[2]=0x56, p[3]=0x78. If the first machine writes those 4 bytes to a socket and the second machine reads them into a uint32_t, the second machine sees 0x78563412, not 0x12345678.

The fix is to serialize through a canonical order. POSIX provides helpers:

#include <arpa/inet.h>

uint32_t host = 0x12345678;
uint32_t net = htonl(host); /* host -> network (big endian) */
/* write 4 bytes of `net` to the wire */

uint32_t back = ntohl(net); /* reading back */

On a little-endian host, htonl swaps; on a big-endian host it is a no-op. Either way, the wire bytes are a fixed big-endian order.

Common Confusion / Misconception

"My machine is little-endian, so all my numbers are little-endian." Only when stored as multi-byte integers in memory. In registers, a number is just a number; endianness is a memory concept.

"memcpy handles endianness." memcpy copies bytes verbatim. If two machines disagree on endianness, memcpy faithfully transfers the wrong interpretation.

"Big-endian is dead." Network protocols still use it, and several embedded systems still ship it. File formats (PNG, Java class files) often specify big-endian. You cannot ignore it.

How To Use It

Whenever a multi-byte value crosses a boundary:

  1. Pick a canonical byte order (almost always big-endian for wire formats).
  2. Convert on both ends with htonl, htons, ntohl, ntohs, or explicit shift-and-mask.
  3. Never serialize a whole struct with write(fd, &s, sizeof s); serialize each field explicitly.
  4. For a double, send the IEEE bits through memcpy into a uint64_t and byte-swap that integer.

Check Yourself

  1. On a little-endian machine, what four bytes (in address order) does uint16_t v = 0xBEEF; memcpy(buf, &v, 2); write?
  2. Why is htonl(htonl(x)) == x on every machine even though htonl may do nothing or swap?
  3. Why can write(fd, &struct_s, sizeof struct_s) be a portability bug even if the two machines have the same endianness?

Mini Drill or Application

#include <stdio.h>
#include <stdint.h>
#include <string.h>

int main(void) {
uint16_t v = 1;
unsigned char *p = (unsigned char *)&v;
printf("this machine is %s-endian\n",
(p[0] == 1) ? "little" : "big");

uint32_t x = 0xAABBCCDD;
unsigned char b[4];
memcpy(b, &x, 4);
printf("bytes: %02x %02x %02x %02x\n", b[0], b[1], b[2], b[3]);

uint32_t swapped = ((x & 0x000000FFu) << 24) |
((x & 0x0000FF00u) << 8) |
((x & 0x00FF0000u) >> 8) |
((x & 0xFF000000u) >> 24);
printf("swapped: 0x%08x\n", swapped);
return 0;
}

Build: gcc -Wall -Wextra -o endian endian.c. Predict every line. On Linux x86-64 you will get "little-endian", bytes DD CC BB AA, and swapped 0xDDCCBBAA.

Read This Only If Stuck