How does Linux use the value of PCID?
I’m trying to understand how Linux uses PCID (aka ASID) on Intel architecture. When I investigated the source code and patches for the Linux kernel, I found a definition with a comment:
/*
* 6 because 6 should be plenty and struct tlb_state will fit in two cache
* lines.
*/
#define TLB_NR_DYN_ASIDS 6
I guess this is saying that Linux only uses 6 PCID values, but what about this comment :
/*
* The x86 feature is called PCID (Process Context IDentifier). It is similar
* to what is traditionally called ASID on the RISC processors.
*
* We don't use the traditional ASID implementation, where each process/mm gets
* its own ASID and flush/restart when we run out of ASID space.
*
* Instead we have a small per-cpu array of ASIDs and cache the last few mm's
* that came by on this CPU, allowing cheaper switch_mm between processes on
* this CPU.
*
* We end up with different spaces for different things. To avoid confusion we
* use different names for each of them:
*
* ASID - [0, TLB_NR_DYN_ASIDS-1]
* the canonical identifier for an mm
*
* kPCID - [1, TLB_NR_DYN_ASIDS]
* the value we write into the PCID part of CR3; corresponds to the
* ASID+1, because PCID 0 is special.
*
* uPCID - [2048 + 1, 2048 + TLB_NR_DYN_ASIDS]
* for KPTI each mm has two address spaces and thus needs two
* PCID values, but we can still do with a single ASID denomination
* for each mm. Corresponds to kPCID + 2048.
*
*/
As mentioned in the previous comment, I assume that Linux only uses 6 values for PCID, so in parentheses we only see a single value (not an array). So here the ADID
can only be 0
and 5
, kPCID can only be 1
, and 6 and uPCID
can only be 2049 and
2048
+ 6
= 2054
Right?
I have a couple of questions at this point:
- Why do PCIDs only have 6 values? (Why so much?) )
- If we choose 6 PCIDs, why would the
tlb_state
structure fit into two cache lines? - Why does Linux use these values for
ASID
, kPCID, anduPCID
(I mean the second comment)?
Solution
As it is said in the previous comment I suppose that Linux uses only 6 values for PCIDs so in brackets we see just single values (not arrays)
No, this is wrong, those are the range. [0,
TLB_NR_DYN_ASIDS-1] represents from 0
to TLB_NR_DYN_ASIDS-1
inclusive. Read on for more details.
There are a few points to consider:
- The difference between ASID (
Address Space Identifier) and PCID (Process Context Identifier) is just nomenclature: Linux calls this feature ASID in all architectures. Intel refers to its implementation as PCID. Linux ASID starts at 0 and Intel’s PCID starts at 1 because 0 is special and means “no PCID”.
On x86 processors that support this feature, the PCID is a 12-bit value, so technically there may be 4095 different PCIDs (1 to 4095, because 0 is special).
Since Kernel Page-Table Isolation, Linux still requires two different tasks per task PCID。 The difference between
kPCID and uPCID
is for this reason, because each task actually has two different virtual address spaces whose address translation needs to be cached separately, hence the use of different PCIDs.So we are left with only 2047 usable pairs of PCIDs (plus the last unused).
Any normal system can easily exceed 2047 tasks on a single CPU, so no matter how many bits you use, you will never be able to provide enough PCID for all your existing tasks. On systems with a large number of CPUs, you also do not provide enough PCIDs for all event tasks.
cannot implement PCID support to simply assign unique values to each existing/event task (e.g., as you do for PID). Sooner or later, multiple tasks will need to “share” the same PCID (not at the same time, but at different points in time). Therefore, the logic for managing PCIDs needs to be different.
Due to 4, you
The choice made by Linux developers is to use PCID as a way to optimize access to recently used mms (struct mm
). This is achieved using a global per-CPU array (cpu_tlbstate.ctxs) that sweeps
linearly on each mm switch. Even if the value of TLB_NR_DYN_ASIDS
is small, it is easy to break performance rather than improve it. Obviously, 6 is a good choice because it offers decent performance improvements. This means that only 6 recently used mms will use non-zero PCIDs (well, technically, 6 recently used user/kernel mm pairs).
You can The patch that implemented PCID support sees a more concise explanation of this reasoning
Why will
tlb_state
structure fit in two cache lines if we choose 6 PCIDs?
It’s just simple math:
struct tlb_state {
struct mm_struct * loaded_mm; /* 0 8 */
union {
struct mm_struct * last_user_mm; /* 8 8 */
long unsigned int last_user_mm_spec; /* 8 8 */
}; /* 8 8 */
u16 loaded_mm_asid; /* 16 2 */
u16 next_asid; /* 18 2 */
bool invalidate_other; /* 20 1 */
/* XXX 1 byte hole, try to pack */
short unsigned int user_pcid_flush_mask; /* 22 2 */
long unsigned int cr4; /* 24 8 */
struct tlb_context ctxs[6]; /* 32 96 */
/* size: 128, cachelines: 2, members: 8 */
/* sum members: 127, holes: 1, sum holes: 1 */
};
(Information extracted from kernel images with debug symbols by >pahole).
The struct tlb_context
array is used to keep track of the ASID, which contains TLB_NR_DYN_ASIDS
(6) entries.