Skip to main content

Chapter 8: Greedy Algorithms

This page is a generated reference surface for selective reading. It exists to keep the learner apps guide-first while still preserving source access.

Learning objectives

  • Explain the main ideas and vocabulary in Greedy Algorithms.
  • Work through the source examples for Greedy Algorithms without depending on raw chunk order.
  • Use Greedy Algorithms as selective reference when learner modules point back to Grokking Algorithms.

Prerequisites

  • Earlier prerequisite concepts leading into Chapter 8: Greedy Algorithms.

Module targets

  • module-01-algorithm-analysis-design
  • module-01-algorithm-intuition
  • module-04-dynamic-programming

AI companion modes

  • Explain simply
  • Socratic tutor
  • Quiz me
  • Challenge my understanding
  • Diagnose my confusion
  • Generate extra practice
  • Revision mode
  • Connect forward / backward

Source-of-truth note

This unit is anchored to Grokking Algorithms and the source chapter "Chapter 8: Greedy Algorithms". Use external resources only to clarify, extend, or modernize details without replacing the chapter's conceptual spine.

External enrichment

No chapter-specific enrichment resources are curated yet. Add them in the unit manifest when a source clearly improves learning.

Source provenance

  • Primary source: Grokking Algorithms
  • Source chapter 08: Chapter 8: Greedy Algorithms
  • Raw source file: 027-08-8-greedy-algorithms.md
  • Raw source file: 028-08-sets.md
  • Raw source file: 029-08-back-to-the-code.md

Merged source

Greedy Algorithms

8 Greedy Algorithms

8 Greedy Algorithms

In this chapter

You learn how to tackle the impossible: problems that have no fast algorithmic solution (NP-complete problems).

You learn how to identify such problems when you see them, so you don't waste time trying to find a fast algorithm for them.

You learn about approximation algorithms, which you can use to find an approximate solution to an NP-complete problem quickly.

You learn about the greedy strategy, a very simple problem-solving strategy.

The classroom scheduling problem

Suppose you have a classroom and want to hold as many classes here as possible. You get a list of classes.

You can't hold all of these classes in there, because some of them overlap.

You want to hold as many classes as possible in this classroom. How do you pick what set of classes to hold, so that you get the biggest set of classes possible?

Sounds like a hard problem, right? Actually, the algorithm is so easy, it might surprise you. Here's how it works:

  • Pick the class that ends the soonest. This is the first class you'll hold in this classroom.
  1. Now, you have to pick a class that starts after the first class. Again, pick the class that ends the soonest. This is the second class you'll hold.

Keep doing this, and you'll end up with the answer! Let's try it out. Art ends the soonest, at 10:00 a.m., so that's one of the classes you pick.

Now you need the next class that starts after 10:00 a.m. and ends the soonest.

English is out because it conflicts with Art, but Math works. Finally, CS conflicts with Math, but Music works.

So these are the three classes you'll hold in this classroom.

A lot of people tell me that this algorithm seems easy. It's too obvious, so it must be wrong. But that's the beauty of greedy algorithms: they're easy! A greedy algorithm is simple: at each step, pick the optimal move. In this case, each time you pick a class, you pick the class that ends the soonest. In technical terms: at each step you pick the locally optimal solution, and in the end you're left with the globally optimal solution. Believe it or not, this simple algorithm finds the optimal solution to this scheduling problem!

Obviously, greedy algorithms don't always work. But they're simple to write! Let's look at another example.

The knapsack problem

Suppose you're a greedy thief. You're in a store with a knapsack, and there are all these items you can steal. But you can only take what you can fit in your knapsack. The knapsack can hold 35 pounds.

You're trying to maximize the value of the items you put in your knapsack. What algorithm do you use?

Again, the greedy strategy is pretty simple:

  • Pick the most expensive thing that will fit in your knapsack.

  • Pick the next most expensive thing that will fit in your knapsack. And so on.

Except this time, it doesn't work! For example, suppose there are three items you can steal.

Your knapsack can hold 35 pounds of items. The stereo system is the most expensive, so you steal that. Now you don't have space for anything else.

You got $3,000 worth of goods. But wait! If you'd picked the laptop and the guitar instead, you could have had $3,500 worth of loot!

Clearly, the greedy strategy doesn't give you the optimal solution here. But it gets you pretty close. In the next chapter, I'll explain how to calculate the correct solution. But if you're a thief in a shopping center, you don't care about perfect. "Pretty good" is good enough.

Here's the takeaway from this second example: sometimes, perfect is the enemy of good. Sometimes all you need is an algorithm that solves the problem pretty well. And that's where greedy algorithms shine, because they're simple to write and usually get pretty close.


Sets

Sets

Exercises

8.1 You work for a furniture company, and you have to ship furniture all over the country. You need to pack your truck with boxes. All the boxes are of different sizes, and you're trying to maximize the space you use in each truck. How would you pick boxes to maximize space? Come up with a greedy strategy. Will that give you the optimal solution?

  • You're going to Europe, and you have seven days to see everything you can. You assign a point value to each item (how much you want

to see it) and estimate how long it takes. How can you maximize the point total (seeing all the things you really want to see) during your stay? Come up with a greedy strategy. Will that give you the optimal solution?

Let's look at one last example. This is an example where greedy algorithms are absolutely necessary.

The set-covering problem

Suppose you're starting a radio show. You want to reach listeners in all 50 states. You have to decide what stations to play on to reach all those listeners. It costs money to be on each station, so you're trying to minimize the number of stations you play on. You have a list of stations.

Each station covers a region, and there's overlap. How do you figure out the smallest set of stations you can play on to cover all 50 states? Sounds easy, doesn't it? Turns out it's extremely hard. Here's how to do it:

  1. List every possible subset of stations. This is called the power set. There are 2^n possible subsets.
  • From these, pick the set with the smallest number of stations that covers all 50 states.

The problem is, it takes a long time to calculate every possible subset of stations. It takes O(2^n) time, because there are 2^n stations. It's possible to do if you have a small set of 5 to 10 stations. But with all the examples here, think about what will happen if you have a lot of items. It takes much longer if you have more stations. Suppose you can calculate 10 subsets per second.

There's no algorithm that solves it fast enough! What can you do?

Approximation algorithms

Greedy algorithms to the rescue! Here's a greedy algorithm that comes pretty close: 1. Pick the station that covers the most states that haven't been covered

yet. It's OK if the station covers some states that have been covered already. 2. Repeat until all the states are covered.

This is called an approximation algorithm. When calculating the exact solution will take too much time, an approximation algorithm will work. Approximation algorithms are judged by

How fast they are

How close they are to the optimal solution

Greedy algorithms are a good choice because not only are they simple to come up with, but that simplicity means they usually run fast, too. In this case, the greedy algorithm runs in O(n^2) time, where n is the number of radio stations.

Let's see how this problem looks in code.

Code for setup

For this example, I'm going to use a subset of the states and the stations to keep things simple.

First, make a list of the states you want to cover:

states_needed = set(["mt", "wa", "or", "id", "nv", "ut",

"ca", "az"]) You pass an array in, and it gets converted to a set.

I used a set for this. A set is like a list, except that each item can show up only once in a set. Sets can't have duplicates. For example, suppose you had this list:

arr = [1, 2, 2, 3, 3, 3]

And you converted it to a set:

>>> set(arr)
set([1, 2, 3])

1, 2, and 3 all show up just once in a set.

You also need the list of stations that you're choosing from. I chose to use a hash for this:

stations = {}
stations["kone"] = set(["id", "nv", "ut"])
stations["ktwo"] = set(["wa", "id", "mt"])
stations["kthree"] = set(["or", "nv", "ca"])
stations["kfour"] = set(["nv", "ut"])
stations["kfive"] = set(["ca", "az"])

The keys are station names, and the values are the states they cover. So in this example, the kone station covers Idaho, Nevada, and Utah. All the values are sets, too. Making everything a set will make your life easier, as you'll see soon.

Finally, you need something to hold the final set of stations you'll use:

final_stations = set()

Calculating the answer

Now you need to calculate what stations you'll use. Take a look at the image at right, and see if you can predict what stations you should use.

There can be more than one correct solution. You need to go through every station and pick the one that covers the most uncovered states. I'll call this best_station:

best_station = None
states_covered = set()
for station, states_for_station in stations.items():

states_covered is a set of all the states this station covers that haven't been covered yet. The for loop allows you to loop over every station to see which one is the best station. Let's look at the body of the for loop:

covered = states_needed & states_for_station
if len(covered) > len(states_covered):

New syntax! This is called a set intersection.

best_station = station
states_covered = covered

There's a funny-looking line here:

covered = states_needed & states_for_station

What's going on?

Sets

Suppose you have a set of fruits.

You also have a set of vegetables. When you have two sets, you can do some fun things with them.

Here are some things you can do with sets.

A set union means "combine both sets."

A set intersection means "find the items that show up in both sets" (in this case, just the tomato).

A set difference means "subtract the items in one set from the items in the other set."

For example:

fruits = set(["avocado", "tomato", "banana"])

vegetables = set(["beets", "carrots", "tomato"])

fruits | vegetables This is a set union.

set(["avocado", "beets", "carrots", "tomato", "banana"])

fruits & vegetables This is a set intersection.

set(["tomato"])

fruits vegetables This is a set difference.

set(["avocado", "banana"])

vegetables fruits What do you think this will do?

To recap:

Sets are like lists, except sets can't have duplicates.

You can do some interesting operations on sets, like union, intersection, and difference.


Back To The Code

Back to the code

Back to the code

Let's get back to the original example. This is a set intersection:

covered = states_needed & states_for_station

covered is a set of states that were in both states_needed and states_for_station. So covered is the set of uncovered states that this station covers! Next you check whether this station covers more states than the current best_station:

if len(covered) > len(states_covered):
best_station = station
states_covered = covered

If so, this station is the new best_station. Finally, after the for loop is over, you add best_station to the final list of stations:

final_stations.add(best_station)

You also need to update states_needed. Because this station covers some states, those states aren't needed anymore:

states_needed -= states_covered

And you loop until states_needed is empty. Here's the full code for the loop:

while states_needed:
best_station = None
states_covered = set()
for station, states in stations.items():
covered = states_needed & states
if len(covered) > len(states_covered):
best_station = station
states_covered = covered
states_needed -= states_covered
final_stations.add(best_station)

Finally, you can print final_stations, and you should see this:

>>> print final_stations
set([`ktwo', `kthree', `kone', `kfive'])

Is that what you expected? Instead of stations 1, 2, 3, and 5, you could have chosen stations 2, 3, 4, and 5. Let's compare the run time of the greedy algorithm to the exact algorithm.

  • EXERCISES For each of these algorithms, say whether it's a greedy algorithm or not.
  • Quicksort
  • Breadth-first search
  • Dijkstra's algorithm

NP-complete problems

To solve the set-covering problem, you had to calculate every possible set.

Maybe you were reminded of the traveling salesperson problem from chapter 1. In this problem, a salesperson has to visit five different cities.

And he's trying to figure out the shortest route that will take him to all five cities. To find the shortest route, you first have to calculate every possible route.

How many routes do you have to calculate for five cities?

Traveling salesperson, step by step

Let's start small. Suppose you only have two cities. There are two routes to choose from.

Same route or different?

You may think this should be the same route. After all, isn't SF > Marin the same distance as Marin > SF? Not necessarily. Some cities (like San Francisco) have a lot of one-way streets, so you can't go back the way you came. You might also have to go 1 or 2 miles out of the way to find an onramp to a highway. So these two routes aren't necessarily the same.

You may be wondering, "In the traveling salesperson problem, is there a specific city you need to start from?" For example, let's say I'm the traveling salesperson. I live in San Francisco, and I need to go to four other cities. San Francisco would be my start city.

But sometimes the start city isn't set. Suppose you're FedEx, trying to deliver a package to the Bay Area. The package is being flown in from Chicago to one of 50 FedEx locations in the Bay Area. Then that package will go on a truck that will travel to different locations delivering packages. Which location should it get flown to? Here the start location is unknown. It's up to you to compute the optimal path and start location for the traveling salesperson.

The running time for both versions is the same. But it's an easier example if there's no defined start city, so I'll go with that version.

Two cities = two possible routes.