123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615 |
- L1TF - L1 Terminal Fault
- ========================
- L1 Terminal Fault is a hardware vulnerability which allows unprivileged
- speculative access to data which is available in the Level 1 Data Cache
- when the page table entry controlling the virtual address, which is used
- for the access, has the Present bit cleared or other reserved bits set.
- Affected processors
- -------------------
- This vulnerability affects a wide range of Intel processors. The
- vulnerability is not present on:
- - Processors from AMD, Centaur and other non Intel vendors
- - Older processor models, where the CPU family is < 6
- - A range of Intel ATOM processors (Cedarview, Cloverview, Lincroft,
- Penwell, Pineview, Silvermont, Airmont, Merrifield)
- - The Intel XEON PHI family
- - Intel processors which have the ARCH_CAP_RDCL_NO bit set in the
- IA32_ARCH_CAPABILITIES MSR. If the bit is set the CPU is not affected
- by the Meltdown vulnerability either. These CPUs should become
- available by end of 2018.
- Whether a processor is affected or not can be read out from the L1TF
- vulnerability file in sysfs. See :ref:`l1tf_sys_info`.
- Related CVEs
- ------------
- The following CVE entries are related to the L1TF vulnerability:
- ============= ================= ==============================
- CVE-2018-3615 L1 Terminal Fault SGX related aspects
- CVE-2018-3620 L1 Terminal Fault OS, SMM related aspects
- CVE-2018-3646 L1 Terminal Fault Virtualization related aspects
- ============= ================= ==============================
- Problem
- -------
- If an instruction accesses a virtual address for which the relevant page
- table entry (PTE) has the Present bit cleared or other reserved bits set,
- then speculative execution ignores the invalid PTE and loads the referenced
- data if it is present in the Level 1 Data Cache, as if the page referenced
- by the address bits in the PTE was still present and accessible.
- While this is a purely speculative mechanism and the instruction will raise
- a page fault when it is retired eventually, the pure act of loading the
- data and making it available to other speculative instructions opens up the
- opportunity for side channel attacks to unprivileged malicious code,
- similar to the Meltdown attack.
- While Meltdown breaks the user space to kernel space protection, L1TF
- allows to attack any physical memory address in the system and the attack
- works across all protection domains. It allows an attack of SGX and also
- works from inside virtual machines because the speculation bypasses the
- extended page table (EPT) protection mechanism.
- Attack scenarios
- ----------------
- 1. Malicious user space
- ^^^^^^^^^^^^^^^^^^^^^^^
- Operating Systems store arbitrary information in the address bits of a
- PTE which is marked non present. This allows a malicious user space
- application to attack the physical memory to which these PTEs resolve.
- In some cases user-space can maliciously influence the information
- encoded in the address bits of the PTE, thus making attacks more
- deterministic and more practical.
- The Linux kernel contains a mitigation for this attack vector, PTE
- inversion, which is permanently enabled and has no performance
- impact. The kernel ensures that the address bits of PTEs, which are not
- marked present, never point to cacheable physical memory space.
- A system with an up to date kernel is protected against attacks from
- malicious user space applications.
- 2. Malicious guest in a virtual machine
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- The fact that L1TF breaks all domain protections allows malicious guest
- OSes, which can control the PTEs directly, and malicious guest user
- space applications, which run on an unprotected guest kernel lacking the
- PTE inversion mitigation for L1TF, to attack physical host memory.
- A special aspect of L1TF in the context of virtualization is symmetric
- multi threading (SMT). The Intel implementation of SMT is called
- HyperThreading. The fact that Hyperthreads on the affected processors
- share the L1 Data Cache (L1D) is important for this. As the flaw allows
- only to attack data which is present in L1D, a malicious guest running
- on one Hyperthread can attack the data which is brought into the L1D by
- the context which runs on the sibling Hyperthread of the same physical
- core. This context can be host OS, host user space or a different guest.
- If the processor does not support Extended Page Tables, the attack is
- only possible, when the hypervisor does not sanitize the content of the
- effective (shadow) page tables.
- While solutions exist to mitigate these attack vectors fully, these
- mitigations are not enabled by default in the Linux kernel because they
- can affect performance significantly. The kernel provides several
- mechanisms which can be utilized to address the problem depending on the
- deployment scenario. The mitigations, their protection scope and impact
- are described in the next sections.
- The default mitigations and the rationale for choosing them are explained
- at the end of this document. See :ref:`default_mitigations`.
- .. _l1tf_sys_info:
- L1TF system information
- -----------------------
- The Linux kernel provides a sysfs interface to enumerate the current L1TF
- status of the system: whether the system is vulnerable, and which
- mitigations are active. The relevant sysfs file is:
- /sys/devices/system/cpu/vulnerabilities/l1tf
- The possible values in this file are:
- =========================== ===============================
- 'Not affected' The processor is not vulnerable
- 'Mitigation: PTE Inversion' The host protection is active
- =========================== ===============================
- If KVM/VMX is enabled and the processor is vulnerable then the following
- information is appended to the 'Mitigation: PTE Inversion' part:
- - SMT status:
- ===================== ================
- 'VMX: SMT vulnerable' SMT is enabled
- 'VMX: SMT disabled' SMT is disabled
- ===================== ================
- - L1D Flush mode:
- ================================ ====================================
- 'L1D vulnerable' L1D flushing is disabled
- 'L1D conditional cache flushes' L1D flush is conditionally enabled
- 'L1D cache flushes' L1D flush is unconditionally enabled
- ================================ ====================================
- The resulting grade of protection is discussed in the following sections.
- Host mitigation mechanism
- -------------------------
- The kernel is unconditionally protected against L1TF attacks from malicious
- user space running on the host.
- Guest mitigation mechanisms
- ---------------------------
- .. _l1d_flush:
- 1. L1D flush on VMENTER
- ^^^^^^^^^^^^^^^^^^^^^^^
- To make sure that a guest cannot attack data which is present in the L1D
- the hypervisor flushes the L1D before entering the guest.
- Flushing the L1D evicts not only the data which should not be accessed
- by a potentially malicious guest, it also flushes the guest
- data. Flushing the L1D has a performance impact as the processor has to
- bring the flushed guest data back into the L1D. Depending on the
- frequency of VMEXIT/VMENTER and the type of computations in the guest
- performance degradation in the range of 1% to 50% has been observed. For
- scenarios where guest VMEXIT/VMENTER are rare the performance impact is
- minimal. Virtio and mechanisms like posted interrupts are designed to
- confine the VMEXITs to a bare minimum, but specific configurations and
- application scenarios might still suffer from a high VMEXIT rate.
- The kernel provides two L1D flush modes:
- - conditional ('cond')
- - unconditional ('always')
- The conditional mode avoids L1D flushing after VMEXITs which execute
- only audited code paths before the corresponding VMENTER. These code
- paths have been verified that they cannot expose secrets or other
- interesting data to an attacker, but they can leak information about the
- address space layout of the hypervisor.
- Unconditional mode flushes L1D on all VMENTER invocations and provides
- maximum protection. It has a higher overhead than the conditional
- mode. The overhead cannot be quantified correctly as it depends on the
- workload scenario and the resulting number of VMEXITs.
- The general recommendation is to enable L1D flush on VMENTER. The kernel
- defaults to conditional mode on affected processors.
- **Note**, that L1D flush does not prevent the SMT problem because the
- sibling thread will also bring back its data into the L1D which makes it
- attackable again.
- L1D flush can be controlled by the administrator via the kernel command
- line and sysfs control files. See :ref:`mitigation_control_command_line`
- and :ref:`mitigation_control_kvm`.
- .. _guest_confinement:
- 2. Guest VCPU confinement to dedicated physical cores
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- To address the SMT problem, it is possible to make a guest or a group of
- guests affine to one or more physical cores. The proper mechanism for
- that is to utilize exclusive cpusets to ensure that no other guest or
- host tasks can run on these cores.
- If only a single guest or related guests run on sibling SMT threads on
- the same physical core then they can only attack their own memory and
- restricted parts of the host memory.
- Host memory is attackable, when one of the sibling SMT threads runs in
- host OS (hypervisor) context and the other in guest context. The amount
- of valuable information from the host OS context depends on the context
- which the host OS executes, i.e. interrupts, soft interrupts and kernel
- threads. The amount of valuable data from these contexts cannot be
- declared as non-interesting for an attacker without deep inspection of
- the code.
- **Note**, that assigning guests to a fixed set of physical cores affects
- the ability of the scheduler to do load balancing and might have
- negative effects on CPU utilization depending on the hosting
- scenario. Disabling SMT might be a viable alternative for particular
- scenarios.
- For further information about confining guests to a single or to a group
- of cores consult the cpusets documentation:
- https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt
- .. _interrupt_isolation:
- 3. Interrupt affinity
- ^^^^^^^^^^^^^^^^^^^^^
- Interrupts can be made affine to logical CPUs. This is not universally
- true because there are types of interrupts which are truly per CPU
- interrupts, e.g. the local timer interrupt. Aside of that multi queue
- devices affine their interrupts to single CPUs or groups of CPUs per
- queue without allowing the administrator to control the affinities.
- Moving the interrupts, which can be affinity controlled, away from CPUs
- which run untrusted guests, reduces the attack vector space.
- Whether the interrupts with are affine to CPUs, which run untrusted
- guests, provide interesting data for an attacker depends on the system
- configuration and the scenarios which run on the system. While for some
- of the interrupts it can be assumed that they won't expose interesting
- information beyond exposing hints about the host OS memory layout, there
- is no way to make general assumptions.
- Interrupt affinity can be controlled by the administrator via the
- /proc/irq/$NR/smp_affinity[_list] files. Limited documentation is
- available at:
- https://www.kernel.org/doc/Documentation/IRQ-affinity.txt
- .. _smt_control:
- 4. SMT control
- ^^^^^^^^^^^^^^
- To prevent the SMT issues of L1TF it might be necessary to disable SMT
- completely. Disabling SMT can have a significant performance impact, but
- the impact depends on the hosting scenario and the type of workloads.
- The impact of disabling SMT needs also to be weighted against the impact
- of other mitigation solutions like confining guests to dedicated cores.
- The kernel provides a sysfs interface to retrieve the status of SMT and
- to control it. It also provides a kernel command line interface to
- control SMT.
- The kernel command line interface consists of the following options:
- =========== ==========================================================
- nosmt Affects the bring up of the secondary CPUs during boot. The
- kernel tries to bring all present CPUs online during the
- boot process. "nosmt" makes sure that from each physical
- core only one - the so called primary (hyper) thread is
- activated. Due to a design flaw of Intel processors related
- to Machine Check Exceptions the non primary siblings have
- to be brought up at least partially and are then shut down
- again. "nosmt" can be undone via the sysfs interface.
- nosmt=force Has the same effect as "nosmt" but it does not allow to
- undo the SMT disable via the sysfs interface.
- =========== ==========================================================
- The sysfs interface provides two files:
- - /sys/devices/system/cpu/smt/control
- - /sys/devices/system/cpu/smt/active
- /sys/devices/system/cpu/smt/control:
- This file allows to read out the SMT control state and provides the
- ability to disable or (re)enable SMT. The possible states are:
- ============== ===================================================
- on SMT is supported by the CPU and enabled. All
- logical CPUs can be onlined and offlined without
- restrictions.
- off SMT is supported by the CPU and disabled. Only
- the so called primary SMT threads can be onlined
- and offlined without restrictions. An attempt to
- online a non-primary sibling is rejected
- forceoff Same as 'off' but the state cannot be controlled.
- Attempts to write to the control file are rejected.
- notsupported The processor does not support SMT. It's therefore
- not affected by the SMT implications of L1TF.
- Attempts to write to the control file are rejected.
- ============== ===================================================
- The possible states which can be written into this file to control SMT
- state are:
- - on
- - off
- - forceoff
- /sys/devices/system/cpu/smt/active:
- This file reports whether SMT is enabled and active, i.e. if on any
- physical core two or more sibling threads are online.
- SMT control is also possible at boot time via the l1tf kernel command
- line parameter in combination with L1D flush control. See
- :ref:`mitigation_control_command_line`.
- 5. Disabling EPT
- ^^^^^^^^^^^^^^^^
- Disabling EPT for virtual machines provides full mitigation for L1TF even
- with SMT enabled, because the effective page tables for guests are
- managed and sanitized by the hypervisor. Though disabling EPT has a
- significant performance impact especially when the Meltdown mitigation
- KPTI is enabled.
- EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
- There is ongoing research and development for new mitigation mechanisms to
- address the performance impact of disabling SMT or EPT.
- .. _mitigation_control_command_line:
- Mitigation control on the kernel command line
- ---------------------------------------------
- The kernel command line allows to control the L1TF mitigations at boot
- time with the option "l1tf=". The valid arguments for this option are:
- ============ =============================================================
- full Provides all available mitigations for the L1TF
- vulnerability. Disables SMT and enables all mitigations in
- the hypervisors, i.e. unconditional L1D flushing
- SMT control and L1D flush control via the sysfs interface
- is still possible after boot. Hypervisors will issue a
- warning when the first VM is started in a potentially
- insecure configuration, i.e. SMT enabled or L1D flush
- disabled.
- full,force Same as 'full', but disables SMT and L1D flush runtime
- control. Implies the 'nosmt=force' command line option.
- (i.e. sysfs control of SMT is disabled.)
- flush Leaves SMT enabled and enables the default hypervisor
- mitigation, i.e. conditional L1D flushing
- SMT control and L1D flush control via the sysfs interface
- is still possible after boot. Hypervisors will issue a
- warning when the first VM is started in a potentially
- insecure configuration, i.e. SMT enabled or L1D flush
- disabled.
- flush,nosmt Disables SMT and enables the default hypervisor mitigation,
- i.e. conditional L1D flushing.
- SMT control and L1D flush control via the sysfs interface
- is still possible after boot. Hypervisors will issue a
- warning when the first VM is started in a potentially
- insecure configuration, i.e. SMT enabled or L1D flush
- disabled.
- flush,nowarn Same as 'flush', but hypervisors will not warn when a VM is
- started in a potentially insecure configuration.
- off Disables hypervisor mitigations and doesn't emit any
- warnings.
- It also drops the swap size and available RAM limit restrictions
- on both hypervisor and bare metal.
- ============ =============================================================
- The default is 'flush'. For details about L1D flushing see :ref:`l1d_flush`.
- .. _mitigation_control_kvm:
- Mitigation control for KVM - module parameter
- -------------------------------------------------------------
- The KVM hypervisor mitigation mechanism, flushing the L1D cache when
- entering a guest, can be controlled with a module parameter.
- The option/parameter is "kvm-intel.vmentry_l1d_flush=". It takes the
- following arguments:
- ============ ==============================================================
- always L1D cache flush on every VMENTER.
- cond Flush L1D on VMENTER only when the code between VMEXIT and
- VMENTER can leak host memory which is considered
- interesting for an attacker. This still can leak host memory
- which allows e.g. to determine the hosts address space layout.
- never Disables the mitigation
- ============ ==============================================================
- The parameter can be provided on the kernel command line, as a module
- parameter when loading the modules and at runtime modified via the sysfs
- file:
- /sys/module/kvm_intel/parameters/vmentry_l1d_flush
- The default is 'cond'. If 'l1tf=full,force' is given on the kernel command
- line, then 'always' is enforced and the kvm-intel.vmentry_l1d_flush
- module parameter is ignored and writes to the sysfs file are rejected.
- .. _mitigation_selection:
- Mitigation selection guide
- --------------------------
- 1. No virtualization in use
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^
- The system is protected by the kernel unconditionally and no further
- action is required.
- 2. Virtualization with trusted guests
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- If the guest comes from a trusted source and the guest OS kernel is
- guaranteed to have the L1TF mitigations in place the system is fully
- protected against L1TF and no further action is required.
- To avoid the overhead of the default L1D flushing on VMENTER the
- administrator can disable the flushing via the kernel command line and
- sysfs control files. See :ref:`mitigation_control_command_line` and
- :ref:`mitigation_control_kvm`.
- 3. Virtualization with untrusted guests
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
- 3.1. SMT not supported or disabled
- """"""""""""""""""""""""""""""""""
- If SMT is not supported by the processor or disabled in the BIOS or by
- the kernel, it's only required to enforce L1D flushing on VMENTER.
- Conditional L1D flushing is the default behaviour and can be tuned. See
- :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
- 3.2. EPT not supported or disabled
- """"""""""""""""""""""""""""""""""
- If EPT is not supported by the processor or disabled in the hypervisor,
- the system is fully protected. SMT can stay enabled and L1D flushing on
- VMENTER is not required.
- EPT can be disabled in the hypervisor via the 'kvm-intel.ept' parameter.
- 3.3. SMT and EPT supported and active
- """""""""""""""""""""""""""""""""""""
- If SMT and EPT are supported and active then various degrees of
- mitigations can be employed:
- - L1D flushing on VMENTER:
- L1D flushing on VMENTER is the minimal protection requirement, but it
- is only potent in combination with other mitigation methods.
- Conditional L1D flushing is the default behaviour and can be tuned. See
- :ref:`mitigation_control_command_line` and :ref:`mitigation_control_kvm`.
- - Guest confinement:
- Confinement of guests to a single or a group of physical cores which
- are not running any other processes, can reduce the attack surface
- significantly, but interrupts, soft interrupts and kernel threads can
- still expose valuable data to a potential attacker. See
- :ref:`guest_confinement`.
- - Interrupt isolation:
- Isolating the guest CPUs from interrupts can reduce the attack surface
- further, but still allows a malicious guest to explore a limited amount
- of host physical memory. This can at least be used to gain knowledge
- about the host address space layout. The interrupts which have a fixed
- affinity to the CPUs which run the untrusted guests can depending on
- the scenario still trigger soft interrupts and schedule kernel threads
- which might expose valuable information. See
- :ref:`interrupt_isolation`.
- The above three mitigation methods combined can provide protection to a
- certain degree, but the risk of the remaining attack surface has to be
- carefully analyzed. For full protection the following methods are
- available:
- - Disabling SMT:
- Disabling SMT and enforcing the L1D flushing provides the maximum
- amount of protection. This mitigation is not depending on any of the
- above mitigation methods.
- SMT control and L1D flushing can be tuned by the command line
- parameters 'nosmt', 'l1tf', 'kvm-intel.vmentry_l1d_flush' and at run
- time with the matching sysfs control files. See :ref:`smt_control`,
- :ref:`mitigation_control_command_line` and
- :ref:`mitigation_control_kvm`.
- - Disabling EPT:
- Disabling EPT provides the maximum amount of protection as well. It is
- not depending on any of the above mitigation methods. SMT can stay
- enabled and L1D flushing is not required, but the performance impact is
- significant.
- EPT can be disabled in the hypervisor via the 'kvm-intel.ept'
- parameter.
- 3.4. Nested virtual machines
- """"""""""""""""""""""""""""
- When nested virtualization is in use, three operating systems are involved:
- the bare metal hypervisor, the nested hypervisor and the nested virtual
- machine. VMENTER operations from the nested hypervisor into the nested
- guest will always be processed by the bare metal hypervisor. If KVM is the
- bare metal hypervisor it will:
- - Flush the L1D cache on every switch from the nested hypervisor to the
- nested virtual machine, so that the nested hypervisor's secrets are not
- exposed to the nested virtual machine;
- - Flush the L1D cache on every switch from the nested virtual machine to
- the nested hypervisor; this is a complex operation, and flushing the L1D
- cache avoids that the bare metal hypervisor's secrets are exposed to the
- nested virtual machine;
- - Instruct the nested hypervisor to not perform any L1D cache flush. This
- is an optimization to avoid double L1D flushing.
- .. _default_mitigations:
- Default mitigations
- -------------------
- The kernel default mitigations for vulnerable processors are:
- - PTE inversion to protect against malicious user space. This is done
- unconditionally and cannot be controlled. The swap storage is limited
- to ~16TB.
- - L1D conditional flushing on VMENTER when EPT is enabled for
- a guest.
- The kernel does not by default enforce the disabling of SMT, which leaves
- SMT systems vulnerable when running untrusted guests with EPT enabled.
- The rationale for this choice is:
- - Force disabling SMT can break existing setups, especially with
- unattended updates.
- - If regular users run untrusted guests on their machine, then L1TF is
- just an add on to other malware which might be embedded in an untrusted
- guest, e.g. spam-bots or attacks on the local network.
- There is no technical way to prevent a user from running untrusted code
- on their machines blindly.
- - It's technically extremely unlikely and from today's knowledge even
- impossible that L1TF can be exploited via the most popular attack
- mechanisms like JavaScript because these mechanisms have no way to
- control PTEs. If this would be possible and not other mitigation would
- be possible, then the default might be different.
- - The administrators of cloud and hosting setups have to carefully
- analyze the risk for their scenarios and make the appropriate
- mitigation choices, which might even vary across their deployed
- machines and also result in other changes of their overall setup.
- There is no way for the kernel to provide a sensible default for this
- kind of scenarios.
|