123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230 |
- PAT (Page Attribute Table)
- x86 Page Attribute Table (PAT) allows for setting the memory attribute at the
- page level granularity. PAT is complementary to the MTRR settings which allows
- for setting of memory types over physical address ranges. However, PAT is
- more flexible than MTRR due to its capability to set attributes at page level
- and also due to the fact that there are no hardware limitations on number of
- such attribute settings allowed. Added flexibility comes with guidelines for
- not having memory type aliasing for the same physical memory with multiple
- virtual addresses.
- PAT allows for different types of memory attributes. The most commonly used
- ones that will be supported at this time are Write-back, Uncached,
- Write-combined, Write-through and Uncached Minus.
- PAT APIs
- --------
- There are many different APIs in the kernel that allows setting of memory
- attributes at the page level. In order to avoid aliasing, these interfaces
- should be used thoughtfully. Below is a table of interfaces available,
- their intended usage and their memory attribute relationships. Internally,
- these APIs use a reserve_memtype()/free_memtype() interface on the physical
- address range to avoid any aliasing.
- -------------------------------------------------------------------
- API | RAM | ACPI,... | Reserved/Holes |
- -----------------------|----------|------------|------------------|
- | | | |
- ioremap | -- | UC- | UC- |
- | | | |
- ioremap_cache | -- | WB | WB |
- | | | |
- ioremap_uc | -- | UC | UC |
- | | | |
- ioremap_nocache | -- | UC- | UC- |
- | | | |
- ioremap_wc | -- | -- | WC |
- | | | |
- ioremap_wt | -- | -- | WT |
- | | | |
- set_memory_uc | UC- | -- | -- |
- set_memory_wb | | | |
- | | | |
- set_memory_wc | WC | -- | -- |
- set_memory_wb | | | |
- | | | |
- set_memory_wt | WT | -- | -- |
- set_memory_wb | | | |
- | | | |
- pci sysfs resource | -- | -- | UC- |
- | | | |
- pci sysfs resource_wc | -- | -- | WC |
- is IORESOURCE_PREFETCH| | | |
- | | | |
- pci proc | -- | -- | UC- |
- !PCIIOC_WRITE_COMBINE | | | |
- | | | |
- pci proc | -- | -- | WC |
- PCIIOC_WRITE_COMBINE | | | |
- | | | |
- /dev/mem | -- | WB/WC/UC- | WB/WC/UC- |
- read-write | | | |
- | | | |
- /dev/mem | -- | UC- | UC- |
- mmap SYNC flag | | | |
- | | | |
- /dev/mem | -- | WB/WC/UC- | WB/WC/UC- |
- mmap !SYNC flag | |(from exist-| (from exist- |
- and | | ing alias)| ing alias) |
- any alias to this area| | | |
- | | | |
- /dev/mem | -- | WB | WB |
- mmap !SYNC flag | | | |
- no alias to this area | | | |
- and | | | |
- MTRR says WB | | | |
- | | | |
- /dev/mem | -- | -- | UC- |
- mmap !SYNC flag | | | |
- no alias to this area | | | |
- and | | | |
- MTRR says !WB | | | |
- | | | |
- -------------------------------------------------------------------
- Advanced APIs for drivers
- -------------------------
- A. Exporting pages to users with remap_pfn_range, io_remap_pfn_range,
- vm_insert_pfn
- Drivers wanting to export some pages to userspace do it by using mmap
- interface and a combination of
- 1) pgprot_noncached()
- 2) io_remap_pfn_range() or remap_pfn_range() or vm_insert_pfn()
- With PAT support, a new API pgprot_writecombine is being added. So, drivers can
- continue to use the above sequence, with either pgprot_noncached() or
- pgprot_writecombine() in step 1, followed by step 2.
- In addition, step 2 internally tracks the region as UC or WC in memtype
- list in order to ensure no conflicting mapping.
- Note that this set of APIs only works with IO (non RAM) regions. If driver
- wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
- as step 0 above and also track the usage of those pages and use set_memory_wb()
- before the page is freed to free pool.
- MTRR effects on PAT / non-PAT systems
- -------------------------------------
- The following table provides the effects of using write-combining MTRRs when
- using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
- mtrr_add() usage will be phased out in favor of arch_phys_wc_add() which will
- be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
- is made, should already have been ioremapped with WC attributes or PAT entries,
- this can be done by using ioremap_wc() / set_memory_wc(). Devices which
- combine areas of IO memory desired to remain uncacheable with areas where
- write-combining is desirable should consider use of ioremap_uc() followed by
- set_memory_wc() to white-list effective write-combined areas. Such use is
- nevertheless discouraged as the effective memory type is considered
- implementation defined, yet this strategy can be used as last resort on devices
- with size-constrained regions where otherwise MTRR write-combining would
- otherwise not be effective.
- ----------------------------------------------------------------------
- MTRR Non-PAT PAT Linux ioremap value Effective memory type
- ----------------------------------------------------------------------
- Non-PAT | PAT
- PAT
- |PCD
- ||PWT
- |||
- WC 000 WB _PAGE_CACHE_MODE_WB WC | WC
- WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC
- WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC
- WC 011 UC _PAGE_CACHE_MODE_UC UC | UC
- ----------------------------------------------------------------------
- (*) denotes implementation defined and is discouraged
- Notes:
- -- in the above table mean "Not suggested usage for the API". Some of the --'s
- are strictly enforced by the kernel. Some others are not really enforced
- today, but may be enforced in future.
- For ioremap and pci access through /sys or /proc - The actual type returned
- can be more restrictive, in case of any existing aliasing for that address.
- For example: If there is an existing uncached mapping, a new ioremap_wc can
- return uncached mapping in place of write-combine requested.
- set_memory_[uc|wc|wt] and set_memory_wb should be used in pairs, where driver
- will first make a region uc, wc or wt and switch it back to wb after use.
- Over time writes to /proc/mtrr will be deprecated in favor of using PAT based
- interfaces. Users writing to /proc/mtrr are suggested to use above interfaces.
- Drivers should use ioremap_[uc|wc] to access PCI BARs with [uc|wc] access
- types.
- Drivers should use set_memory_[uc|wc|wt] to set access type for RAM ranges.
- PAT debugging
- -------------
- With CONFIG_DEBUG_FS enabled, PAT memtype list can be examined by
- # mount -t debugfs debugfs /sys/kernel/debug
- # cat /sys/kernel/debug/x86/pat_memtype_list
- PAT memtype list:
- uncached-minus @ 0x7fadf000-0x7fae0000
- uncached-minus @ 0x7fb19000-0x7fb1a000
- uncached-minus @ 0x7fb1a000-0x7fb1b000
- uncached-minus @ 0x7fb1b000-0x7fb1c000
- uncached-minus @ 0x7fb1c000-0x7fb1d000
- uncached-minus @ 0x7fb1d000-0x7fb1e000
- uncached-minus @ 0x7fb1e000-0x7fb25000
- uncached-minus @ 0x7fb25000-0x7fb26000
- uncached-minus @ 0x7fb26000-0x7fb27000
- uncached-minus @ 0x7fb27000-0x7fb28000
- uncached-minus @ 0x7fb28000-0x7fb2e000
- uncached-minus @ 0x7fb2e000-0x7fb2f000
- uncached-minus @ 0x7fb2f000-0x7fb30000
- uncached-minus @ 0x7fb31000-0x7fb32000
- uncached-minus @ 0x80000000-0x90000000
- This list shows physical address ranges and various PAT settings used to
- access those physical address ranges.
- Another, more verbose way of getting PAT related debug messages is with
- "debugpat" boot parameter. With this parameter, various debug messages are
- printed to dmesg log.
- PAT Initialization
- ------------------
- The following table describes how PAT is initialized under various
- configurations. The PAT MSR must be updated by Linux in order to support WC
- and WT attributes. Otherwise, the PAT MSR has the value programmed in it
- by the firmware. Note, Xen enables WC attribute in the PAT MSR for guests.
- MTRR PAT Call Sequence PAT State PAT MSR
- =========================================================
- E E MTRR -> PAT init Enabled OS
- E D MTRR -> PAT init Disabled -
- D E MTRR -> PAT disable Disabled BIOS
- D D MTRR -> PAT disable Disabled -
- - np/E PAT -> PAT disable Disabled BIOS
- - np/D PAT -> PAT disable Disabled -
- E !P/E MTRR -> PAT init Disabled BIOS
- D !P/E MTRR -> PAT disable Disabled BIOS
- !M !P/E MTRR stub -> PAT disable Disabled BIOS
- Legend
- ------------------------------------------------
- E Feature enabled in CPU
- D Feature disabled/unsupported in CPU
- np "nopat" boot option specified
- !P CONFIG_X86_PAT option unset
- !M CONFIG_MTRR option unset
- Enabled PAT state set to enabled
- Disabled PAT state set to disabled
- OS PAT initializes PAT MSR with OS setting
- BIOS PAT keeps PAT MSR with BIOS setting
|