I am trying to understand how CPU cache is operating. Lets say we have this configuration (as an example).
- Cache size 1024 bytes
- Cache line 32 bytes
- 1024/32 = 32 cache lines all together.
- Singel cache line can store 32/4 = 8 ints.
1) According to these configuration length of tag should be 32-5=27 bits, and size of index 5 bits (2^5 = 32 addresses for each byte in cache line).
If total cache size is 1024 and there are 32 cache lines, where is tags+indexes are stored? (There is another 4*32 = 128 bytes.) Does it means that actual size of the cache is 1024+128 = 1152?
2) If cache line is 32 bytes in this example, this means that 32 bytes getting copied in cache whenerever CPU need to get new byte from RAM. Am I right to assume that cache line position of the requested byte will be determined by its adress?
This is what I mean: if CPU requested byte at
[FF FF 00 08], then available cache line will be filled with bytes from
[FF FF 00 00] to
[FF FF 00 1F]. And our requseted single byte will be at position
3) If previous statement is correct, does it mean that 5 bits that used for index, are technically not needed since all 32 bytes are in the cache line anyway?
Please let me know if I got something wrong.
A cache consists of data and tag RAM, arranged as a compromise of access time vs efficiency and physical layout. You’re missing an important stat: number of ways (sets). You rarely have 1-way caches, because they perform pathologically badly with simple patterns. Anyway:
1) Yes, tags take extra space. This is part of the design compromise – you don’t want it to be a large fraction of the total area, and why line size isn’t just 1 byte or 1 word. Also, all tags for an index are simultaneously accessed, and that can affect efficiency and layout if there’s a large number of ways. The size is slightly bigger than your estimate. There’s usually also a few bits extra bits to mark validity and sometimes hints. More ways and smaller lines needs a larger fraction taken up by tags, so generally lines are large (32+ bytes) and ways are small (4-16).
2) Yes. Some caches also do a “critical word first” fetch, where they start with the word that caused the line fill, then fetch the rest. This reduces the number of cycles the CPU is waiting for the data it actually asked for. Some caches will “write thru” and not allocate a line if you miss on a write, which avoids having to read the entire cache line first, before writing to it (this isn’t always a win).
3) The tags won’t store the lower 5 bits as they’re not needed to match a cache line. They just index into individual lines.
Wikipedia has a pretty good, if a bit intense, write-up on caches: http://en.wikipedia.org/wiki/CPU_cache – see “Implementation”. There’s a diagram of how data and tags are split. Me, I think everyone should learn this stuff because you really can improve performance of code when you know what the underlying machine is actually capable of.