Skip to main content

Aside The Segmentation Fault

This page is a generated reference surface for selective reading. It exists to keep the learner apps guide-first while still preserving source access.

Learning objectives

  • Explain the main ideas and vocabulary in Aside The Segmentation Fault.
  • Work through the source examples for Aside The Segmentation Fault without depending on raw chunk order.
  • Use Aside The Segmentation Fault as selective reference when learner modules point back to Ostep.

Prerequisites

  • None curated yet.

Module targets

  • module-02-memory-management-virtual-memory

AI companion modes

  • Explain simply
  • Socratic tutor
  • Quiz me
  • Challenge my understanding
  • Diagnose my confusion
  • Generate extra practice
  • Revision mode
  • Connect forward / backward

Source-of-truth note

This unit is anchored to Ostep and the source chapter "Aside The Segmentation Fault". Use external resources only to clarify, extend, or modernize details without replacing the chapter's conceptual spine.

External enrichment

No chapter-specific enrichment resources are curated yet. Add them in the unit manifest when a source clearly improves learning.

Source provenance

  • Primary source: Ostep
  • Source chapter: Aside The Segmentation Fault
  • Raw source file: 071-aside-thesegmentationfault.md

Merged source

Aside The Segmentation Fault

ASIDE: THESEGMENTATIONFAULT

The term segmentation fault or violation arises from a memory access on a segmented machine to an illegal address. Humorously, the term persists, even on machines with no support for segmentation at all. Or not so humorously, if you can't figure why your code keeps faulting.

Now let's look at an address in the heap, virtual address 4200 (again refer to Figure 16.1). If we just add the virtual address 4200 to the base of the heap (34KB), we get a physical address of 39016, which isnotthe correct physical address. What we need to first do is extract theoffsetinto the heap, i.e., which byte(s)in this segmentthe address refers to. Because the heap starts at virtual address 4KB (4096), the offset of 4200 is actually 4200 minus 4096, or 104. We then take this offset (104) and add it to the base register physical address (34K) to get the desired result: 34920.

What if we tried to refer to an illegal address, such as 7KB which is beyond the end of the heap? You can imagine what will happen: the hardware detects that the address is out of bounds, traps into the OS, likely leading to the termination of the offending process. And now you know the origin of the famous term that all C programmers learn to dread: the segmentation violationorsegmentation fault.

16.2 Which Segment Are We Referring To?

The hardware uses segment registers during translation. How does it know the offset into a segment, and to which segment an address refers?

One common approach, sometimes referred to as anexplicitapproach, is to chop up the address space into segments based on the top few bits of the virtual address; this technique was used in the VAX/VMS system

[LL82]. In our example above, we have three segments; thus we need two bits to accomplish our task. If we use the top two bits of our 14-bit virtual address to select the segment, our virtual address looks like this:

13 12 11 10 9 8 7 6 5 4 3 2 1 0

Segment Offset

In our example, then, if the top two bits are 00, the hardware knows the virtual address is in the code segment, and thus uses the code base and bounds pair to relocate the address to the correct physical location.

If the top two bits are 01, the hardware knows the address is in the heap, and thus uses the heap base and bounds. Let's take our example heap virtual address from above (4200) and translate it, just to make sure this is clear. The virtual address 4200, in binary form, can be seen here:

13 12 11 10 9 8 7 6 5 4 3 2 1 0 0 1 0 0 0 0 0 1 1 0 1 0 0 0

Segment Offset

As you can see from the picture, the top two bits (01) tell the hardware whichsegmentwe are referring to. The bottom 12 bits are theoffset into the segment: 0000 0110 1000, or hex 0x068, or 104 in decimal. Thus, the hardware simply takes the first two bits to determine which segment register to use, and then takes the next 12 bits as the offset into the segment.

By adding the base register to the offset, the hardware arrives at the final physical address. Note the offset eases the bounds check too: we can simply check if the offset is less than the bounds; if not, the address is illegal. Thus, if base and bounds were arrays (with one entry per segment), the hardware would be doing something like this to obtain the desired physical address:

1 // get top 2 bits of 14-bit VA

Segment = (VirtualAddress & SEG_MASK) >> SEG_SHIFT

3 // now get offset

Offset = VirtualAddress & OFFSET_MASK
if (Offset >= Bounds[Segment])
RaiseException(PROTECTION_FAULT)
else
PhysAddr = Base[Segment] + Offset
Register = AccessMemory(PhysAddr)

In our running example, we can fill in values for the constants above.

Specifically, SEGMASKwould be set to0x3000, SEGSHIFTto12, and

OFFSETMASKto0xFFF.

You may also have noticed that when we use the top two bits, and we only have three segments (code, heap, stack), one segment of the address space goes unused. Thus, some systems put code in the same segment as the heap and thus use only one bit to select which segment to use [LL82].

There are other ways for the hardware to determine which segment a particular address is in. In theimplicitapproach, the hardware determines the segment by noticing how the address was formed. If, for example, the address was generated from the program counter (i.e., it was an instruction fetch), then the address is within the code segment; if the address is based off of the stack or base pointer, it must be in the stack segment; any other address must be in the heap.

16.3 What About The Stack?

Thus far, we've left out one important component of the address space:

the stack. The stack has been relocated to physical address 28KB in the diagram above, but with one critical difference:it grows backwards. In physical memory, it starts at 28KB and grows back to 26KB, corresponding to virtual addresses 16KB to 14KB; translation must proceed differently.

The first thing we need is a little extra hardware support. Instead of just base and bounds values, the hardware also needs to know which way the segment grows (a bit, for example, that is set to 1 when the segment grows in the positive direction, and 0 for negative). Our updated view of what the hardware tracks is seen in Figure 16.4.

Segment Base Size Grows Positive?

Code 32K 2K

Heap 34K 2K 1

Stack 28K 2K 0

Figure 16.4:Segment Registers (With Negative-Growth Support)

With the hardware understanding that segments can grow in the negative direction, the hardware must now translate such virtual addresses slightly differently. Let's take an example stack virtual address and translate it to understand the process.

In this example, assume we wish to access virtual address 15KB, which should map to physical address 27KB. Our virtual address, in binary form, thus looks like this: 11 1100 0000 0000 (hex 0x3C00). The hardware uses the top two bits (11) to designate the segment, but then we are left with an offset of 3KB. To obtain the correct negative offset, we must subtract the maximum segment size from 3KB: in this example, a segment can be 4KB, and thus the correct negative offset is 3KB minus 4KB which equals -1KB. We simply add the negative offset (-1KB) to the base (28KB) to arrive at the correct physical address: 27KB. The bounds check can be calculated by ensuring the absolute value of the negative offset is less than the segment's size.