Cache Explained

A cache in computer terms is a small amount of fast memory that stores frequently used information, so that if that same information is required again (shortly after) it can be retrieved more quickly, thus improving performance of the system.

This page gives a very brief overview of caches in PCs throughout the 80s and 90s, without going into much technical detail. There are numerous resources online that better explain how and why caches are designed the way they are.

There are often several caches found in a PC, as follows:-

Level1 (L1) cache - inside the CPU die.
Level2 (L2) cache - on the motherboard.

For DOS PCs, caching became more important as time grew on. The early PCs (up to and including 80386) had no cache at all. All memory operations would go from the CPU out to main memory and then back again.

L1 Cache

Starting with the launch of the Intel 486DX in 1989, Intel embedded a very small cache within the CPU itself. It was 8 KB in size, and in later generations would be referred to as a "level 1" cache to disinguish it from other caches in the computer. The CPU itself would keep track of frequently used operations and store them in this cache. Because of the close proximity of the cache to the CPU that was calling it, the CPU would have very quick access to the information stored in there (it takes fewer clock cycles to communicate with the L1 cache than using any external memory bus on the motherboard), so if the same instruction was detected again, it would retrieve the information from the cache instead of going all the way out to main memory which was a much slower operation.

L2 Cache

As motherboard manfacturers adopted the 486 CPU, they began to include a secondary cache on the motherboard, which they called the "level 2" (L2) cache. This would still be accessible more quickly than main memory primarily because the cache chips are specified to be faster, for example these SRAM chips would have access times of around 60 or 70ns (nanoseconds) rather than 120ns that main memory would typically run at. They would also be larger in capacity than the L1 cache.

In some cases, a motherboard would have a dedicated cache expansion slot (called a "COAST" slot) that takes an add-on card which contains the cache chips on it. Some motherboards have both cache RAM chips on the board and a cache expansion slot to further expand the size of the L2 cache.

The Size vs Cost Trade-Off

L1 caches are typically small because the silicon on which the CPU and its L1 cache are made is very expensive. The larger the cache, the more area on the silicon it takes up (more transistors are needed to store the information), which directly translates into a higher cost to build the CPU.

L2 caches can be larger simply because they are stored in cheaper ICs (chips) that are soldered or socketed on the motherboard, where physical space is not so much of an issue. During the 486 era, motherboards would typically come with anywhere from 0 to 512 KB of level 2 cache. Often they would be sold with half the L2 cache sockets populated, leaving the buyer to have the choice of purchasing more later on when money would permit.

Technical Explanation

Note: The following text is a subset of a response given on the forum by user Hennes, and all credit goes to him for this detailed but readable answer. I have only made alterations to correct grammar and spelling:-

To understand caches you need to know a few things:

A CPU has registers - values held in these can be directly used. Nothing is faster. However we cannot add infinite registers to a chip. These things take up space. If we make the chip bigger it gets more expensive. Part of that is because we need a larger chip (more silicon), but also because the number of chips with problems will then increase.

(Take an imaginary [silicon] wafer of size 500 cm2. I cut 10 chips from it, each chip is 50cm2 in size.
One of them is broken, so I discard it and I am left with 9 working chips.
Now take the same wafer and cut 100 chips from it, each ten times as small.
One of them is broken. I discard the broken chip and I am left with 99 working chips.
That is a fraction of the loss I would otherwise have had. To compensate for the larger chips I would need to ask higher prices. More than just the price for the extra silicon).

This is one of the reasons why we want small, affordable chips. However the closer the cache is to the CPU, the faster it can be accessed.

This is also easy to explain: electrical signals travel a near light speed. That is fast but is still a finite speed. Modern CPUs work with GHz clocks. That is also fast. If I take a 4 GHz CPU then an electrical signal can travel about 7.5cm per clock tick. That is 7.5 cm in straight line (chips are anything but straight connections). In practice you will need significantly less than those 7.5 cm since that does not allow any time for the chips to present the requested data and for the signal to travel back.

Bottom line: we want the cache as physically as close as possible. Which means large chips.

These two need to be balanced (performance vs. cost).

Where exactly are the L1, L2 and L3 Caches located in a computer?

Assuming PC-style only hardware (mainframes are quite different, including in the performance vs. cost balance):-

IBM XT (1981)
The original 4.77Mhz one: No cache. CPU accesses the memory directly. A read from memory would follow this pattern:

  • The CPU puts the address it wants to read on the memory bus and asserts the read flag.
  • Memory puts the data on the data bus.
  • The CPU copies the data from the data bus to its internal registers.

80286 (1982)
Still no cache. Memory access was not a big problem for the lower speed versions (6Mhz), but the faster model ran up to 20Mhz and often needed to delay (using "wait states") when accessing memory.

You then get a scenario like this:

  • The CPU puts the address it wants to read on the memory bus and asserts the read flag.
  • Memory starts to put the data on the data bus.
  • The CPU waits.
  • Memory finishes getting the data and it is now stable on the data bus.
  • The CPU copies the data from the data bus to its internal registers.

That is an extra step spent waiting for the memory. On a modern system that can easily be 12 steps, which is why we have cache.

80386 (1985)
The CPUs get faster. Both per clock (more efficient), and by running at higher clock speeds.
RAM gets faster, but not at the same rate as the CPU.
As a result more wait states are needed. Some motherboards work around this by adding cache (that would be 1st level cache) on the motherboard.

A read from memory now starts with a check if the data is already in the cache. If it is, the data is read from the much faster cache. If not, the same procedure is as described with the 80286.

80486 (1989)
This is the first CPU for PCs which has some cache on the CPU.
It is an 8 KB "unified" cache which means it is used for both data and instructions.

Around this time it gets common to put 256 KB of fast static memory on the motherboard as 2nd level cache. Thus 1st level cache is on the CPU, 2nd level cache is on the motherboard.

486 motherboard with CPU location and 2nd level cache marked

Pentium (1993)
The 586, or Pentium 1, uses a split-level 1 cache, 8 KB each for data and instructions. The cache was split so that the data and instruction caches could be individually tuned for their specific use. You still have a small yet very fast 1st cache near the CPU, and a larger but slower 2nd cache on the motherboard (at a larger physical distance).

In the same Pentium 1 era Intel produced the Pentium Pro. Depending on the model, this chip had a 256 KB, 512 KB or 1 MB on-board cache. It was also much more expensive, which is easy to explain with the following picture.

Picture of a pentium Pro CPU, 256KB cache model

Notice that half the space in the chip is used by the cache. And this is for the 256 KB model. More cache was technically possible and some models where produced with 512 KB and 1 MB caches. The market price for these was high.

Also notice that this chip contains two dies. One with the actual CPU and 1st cache, and a second die with 256 KB 2nd cache.

Pentium II (1997)

The Pentium II has a Pentium Pro core. For economy reasons no 2nd cache is in the CPU. Instead what is sold as a "CPU" is a PCB [printed circuit board] with separate chips for CPU (and 1st cache) and 2nd cache.

As technology progresses and we start to create chips with smaller components it gets financially possible to put the 2nd cache back in the actual CPU die. However there is still a split. Very fast 1st cache snuggled up to the CPU. With one 1st cache per CPU core and a larger but less fast 2nd cache next to the core.

Picture of a pentium 2 'CPU' (both with and without cover)


Pentium 3 & 4
This does not change for the Pentium 3 or the Pentium 4.

Around this time we have reached a practical limit on how fast we can clock CPUs. An 8086 or a 80286 did not need cooling. A Pentium 4 running at 3.0GHz produces so much heat and uses that much power that it becomes more practical to put two separate CPUs on the motherboard rather than one fast one (two 2.0 GHz CPU's would use less power than an single identical 3.0 GHz CPU, yet could do more work).

This could be solved in three ways:

  1. Make the CPUs more efficient, so they do more work at the same speed.
  2. Use multiple CPUs.
  3. Use multiple CPUs in the same 'chip'.

1) Is an ongoing process. It is not new and it will not stop.

2) Was done early on (e.g. with dual Pentium 1 motherboards and the NX chipset). Until now that was the only option for building a faster PC.

3) Requires CPUs where multiple 'cpu core' are built into a single chip. We then called that CPU a dual core CPU to increase the confusion. Thank you marketing :) These days we just refer to the CPU as a 'core' to avoid confusion.

You now get chips like the Pentium D (duo), which is basically two Pentium 4 cores on the same chip.

Early pentium-D (2 P4 cores)

Remember the picture of the old Pentium Pro? With the huge cache size? See the two large areas in this picture?

It turns out that we can share that 2nd cache between both CPU cores. Speed would drop slightly, but a 512 KB shared 2nd cache is often faster than adding two independent 2nd level caches of half the size. It means that if you read something from one CPU core and later try to read it from another core which shares the same cache that you will get a cache hit. Memory will not need to be accessed.

Since programs do migrate between CPUs, depending on the load, the number of the core and scheduler you can gain additional performance by pinning programs which use the same data to the same CPU (cache hits on L1 and lower) or on the same CPUs which share L2 cache (and thus get misses on L1, but hits on L2 cache reads). Thus on later models you will see shared level 2 caches.