Endianness: Big- vs Little-Endian and Serialization
What This Concept Is
Endianness is the order in which the bytes of a multi-byte integer are stored in memory.
- Little-endian (x86, x86-64, most ARM): lowest-address byte is the least significant. The 32-bit value
0x12345678is stored as78 56 34 12. - Big-endian (network byte order, older PowerPC, some embedded): lowest-address byte is the most significant.
0x12345678is stored as12 34 56 78.
Endianness only affects byte ordering within a single multi-byte word. The bits within each byte are the same on every machine.
Why It Matters Here
Endianness bites you whenever bytes leave the machine:
- network protocols (TCP, IP, DNS) are defined in network byte order, which is big-endian
- file formats often fix one endianness so files port across machines
- debugging hex dumps: knowing the machine's endianness is how you read a
uint32_t - SIMD lane ordering and CPU feature flags usually follow the machine's native endianness
Two programs sharing memory or a network stream must agree on endianness, or every multi-byte field is silently scrambled.
Concrete Example
On a little-endian machine:
uint32_t x = 0x12345678;
unsigned char *p = (unsigned char *)&x;
/* p[0]=0x78, p[1]=0x56, p[2]=0x34, p[3]=0x12 */
On a big-endian machine, the same code gives p[0]=0x12, p[1]=0x34, p[2]=0x56, p[3]=0x78. If the first machine writes those 4 bytes to a socket and the second machine reads them into a uint32_t, the second machine sees 0x78563412, not 0x12345678.
The fix is to serialize through a canonical order. POSIX provides helpers:
#include <arpa/inet.h>
uint32_t host = 0x12345678;
uint32_t net = htonl(host); /* host -> network (big endian) */
/* write 4 bytes of `net` to the wire */
uint32_t back = ntohl(net); /* reading back */
On a little-endian host, htonl swaps; on a big-endian host it is a no-op. Either way, the wire bytes are a fixed big-endian order.
Common Confusion / Misconception
"My machine is little-endian, so all my numbers are little-endian." Only when stored as multi-byte integers in memory. In registers, a number is just a number; endianness is a memory concept.
"memcpy handles endianness." memcpy copies bytes verbatim. If two machines disagree on endianness, memcpy faithfully transfers the wrong interpretation.
"Big-endian is dead." Network protocols still use it, and several embedded systems still ship it. File formats (PNG, Java class files) often specify big-endian. You cannot ignore it.
How To Use It
Whenever a multi-byte value crosses a boundary:
- Pick a canonical byte order (almost always big-endian for wire formats).
- Convert on both ends with
htonl,htons,ntohl,ntohs, or explicit shift-and-mask. - Never serialize a whole struct with
write(fd, &s, sizeof s); serialize each field explicitly. - For a
double, send the IEEE bits throughmemcpyinto auint64_tand byte-swap that integer.
Check Yourself
- On a little-endian machine, what four bytes (in address order) does
uint16_t v = 0xBEEF; memcpy(buf, &v, 2);write? - Why is
htonl(htonl(x)) == xon every machine even thoughhtonlmay do nothing or swap? - Why can
write(fd, &struct_s, sizeof struct_s)be a portability bug even if the two machines have the same endianness?
Mini Drill or Application
#include <stdio.h>
#include <stdint.h>
#include <string.h>
int main(void) {
uint16_t v = 1;
unsigned char *p = (unsigned char *)&v;
printf("this machine is %s-endian\n",
(p[0] == 1) ? "little" : "big");
uint32_t x = 0xAABBCCDD;
unsigned char b[4];
memcpy(b, &x, 4);
printf("bytes: %02x %02x %02x %02x\n", b[0], b[1], b[2], b[3]);
uint32_t swapped = ((x & 0x000000FFu) << 24) |
((x & 0x0000FF00u) << 8) |
((x & 0x00FF0000u) >> 8) |
((x & 0xFF000000u) >> 24);
printf("swapped: 0x%08x\n", swapped);
return 0;
}
Build: gcc -Wall -Wextra -o endian endian.c. Predict every line. On Linux x86-64 you will get "little-endian", bytes DD CC BB AA, and swapped 0xDDCCBBAA.