Linux - How does Linux use the value of PCID?

How does Linux use the value of PCID?… here is a solution to the problem.

How does Linux use the value of PCID?

I’m trying to understand how Linux uses PCID (aka ASID) on Intel architecture. When I investigated the source code and patches for the Linux kernel, I found a definition with a comment:

/*
 * 6 because 6 should be plenty and struct tlb_state will fit in two cache
 * lines.
 */
#define TLB_NR_DYN_ASIDS    6

I guess this is saying that Linux only uses 6 PCID values, but what about this comment :

/*
 * The x86 feature is called PCID (Process Context IDentifier). It is similar
 * to what is traditionally called ASID on the RISC processors.
 *
 * We don't use the traditional ASID implementation, where each process/mm gets
 * its own ASID and flush/restart when we run out of ASID space.
 *
 * Instead we have a small per-cpu array of ASIDs and cache the last few mm's
 * that came by on this CPU, allowing cheaper switch_mm between processes on
 * this CPU.
 *
 * We end up with different spaces for different things. To avoid confusion we
 * use different names for each of them:
 *
 * ASID  - [0, TLB_NR_DYN_ASIDS-1]
 *         the canonical identifier for an mm
 *
 * kPCID - [1, TLB_NR_DYN_ASIDS]
 *         the value we write into the PCID part of CR3; corresponds to the
 *         ASID+1, because PCID 0 is special.
 *
 * uPCID - [2048 + 1, 2048 + TLB_NR_DYN_ASIDS]
 *         for KPTI each mm has two address spaces and thus needs two
 *         PCID values, but we can still do with a single ASID denomination
 *         for each mm. Corresponds to kPCID + 2048.
 *
 */

As mentioned in the previous comment, I assume that Linux only uses 6 values for PCID, so in parentheses we only see a single value (not an array). So here the ADID can only be 0 and 5, kPCID can only be 1, and 6 and uPCID can only be 2049 and 2048 + 6 = 2054 Right?

I have a couple of questions at this point:

Why do PCIDs only have 6 values? (Why so much?) )
If we choose 6 PCIDs, why would the tlb_state structure fit into two cache lines?
Why does Linux use these values for ASID, kPCID, and uPCID (I mean the second comment)?

Solution

As it is said in the previous comment I suppose that Linux uses only 6 values for PCIDs so in brackets we see just single values (not arrays)

No, this is wrong, those are the range. [0, TLB_NR_DYN_ASIDS-1] represents from 0 to TLB_NR_DYN_ASIDS-1 inclusive. Read on for more details.

There are a few points to consider:

Address Space Identifier) and PCID (Process Context Identifier) is just nomenclature: Linux calls this feature ASID in all architectures. Intel refers to its implementation as PCID. Linux ASID starts at 0 and Intel’s PCID starts at 1 because 0 is special and means “no PCID”.
On x86 processors that support this feature, the PCID is a 12-bit value, so technically there may be 4095 different PCIDs (1 to 4095, because 0 is special).
Since Kernel Page-Table Isolation, Linux still requires two different tasks per task PCID。 The difference between kPCID and uPCID is for this reason, because each task actually has two different virtual address spaces whose address translation needs to be cached separately, hence the use of different PCIDs. So we are left with only 2047 usable pairs of PCIDs (plus the last unused).
Any normal system can easily exceed 2047 tasks on a single CPU, so no matter how many bits you use, you will never be able to provide enough PCID for all your existing tasks. On systems with a large number of CPUs, you also do not provide enough PCIDs for all event tasks.

Due to 4, you

cannot implement PCID support to simply assign unique values to each existing/event task (e.g., as you do for PID). Sooner or later, multiple tasks will need to “share” the same PCID (not at the same time, but at different points in time). Therefore, the logic for managing PCIDs needs to be different.

The choice made by Linux developers is to use PCID as a way to optimize access to recently used mms (struct mm). This is achieved using a global per-CPU array (cpu_tlbstate.ctxs) that sweeps linearly on each mm switch. Even if the value of TLB_NR_DYN_ASIDS is small, it is easy to break performance rather than improve it. Obviously, 6 is a good choice because it offers decent performance improvements. This means that only 6 recently used mms will use non-zero PCIDs (well, technically, 6 recently used user/kernel mm pairs).

You can The patch that implemented PCID support sees a more concise explanation of this reasoning

Why will tlb_state structure fit in two cache lines if we choose 6 PCIDs?

It’s just simple math:

struct tlb_state {
        struct mm_struct *         loaded_mm;            /*     0     8 */
        union {
                struct mm_struct * last_user_mm;         /*     8     8 */
                long unsigned int  last_user_mm_spec;    /*     8     8 */
        };                                               /*     8     8 */
        u16                        loaded_mm_asid;       /*    16     2 */
        u16                        next_asid;            /*    18     2 */
        bool                       invalidate_other;     /*    20     1 */

/* XXX 1 byte hole, try to pack */

short unsigned int         user_pcid_flush_mask; /*    22     2 */
        long unsigned int          cr4;                  /*    24     8 */
        struct tlb_context         ctxs[6];              /*    32    96 */

/* size: 128, cachelines: 2, members: 8 */
        /* sum members: 127, holes: 1, sum holes: 1 */
};

^{(Information extracted from kernel images with debug symbols by >pahole).}

The struct tlb_context array is used to keep track of the ASID, which contains TLB_NR_DYN_ASIDS (6) entries.

Linux – How does Linux use the value of PCID?

How does Linux use the value of PCID?

Solution

Related Problems and Solutions