cxlflash.txt 18 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352
  1. Introduction
  2. ============
  3. The IBM Power architecture provides support for CAPI (Coherent
  4. Accelerator Power Interface), which is available to certain PCIe slots
  5. on Power 8 systems. CAPI can be thought of as a special tunneling
  6. protocol through PCIe that allow PCIe adapters to look like special
  7. purpose co-processors which can read or write an application's
  8. memory and generate page faults. As a result, the host interface to
  9. an adapter running in CAPI mode does not require the data buffers to
  10. be mapped to the device's memory (IOMMU bypass) nor does it require
  11. memory to be pinned.
  12. On Linux, Coherent Accelerator (CXL) kernel services present CAPI
  13. devices as a PCI device by implementing a virtual PCI host bridge.
  14. This abstraction simplifies the infrastructure and programming
  15. model, allowing for drivers to look similar to other native PCI
  16. device drivers.
  17. CXL provides a mechanism by which user space applications can
  18. directly talk to a device (network or storage) bypassing the typical
  19. kernel/device driver stack. The CXL Flash Adapter Driver enables a
  20. user space application direct access to Flash storage.
  21. The CXL Flash Adapter Driver is a kernel module that sits in the
  22. SCSI stack as a low level device driver (below the SCSI disk and
  23. protocol drivers) for the IBM CXL Flash Adapter. This driver is
  24. responsible for the initialization of the adapter, setting up the
  25. special path for user space access, and performing error recovery. It
  26. communicates directly the Flash Accelerator Functional Unit (AFU)
  27. as described in Documentation/powerpc/cxl.txt.
  28. The cxlflash driver supports two, mutually exclusive, modes of
  29. operation at the device (LUN) level:
  30. - Any flash device (LUN) can be configured to be accessed as a
  31. regular disk device (i.e.: /dev/sdc). This is the default mode.
  32. - Any flash device (LUN) can be configured to be accessed from
  33. user space with a special block library. This mode further
  34. specifies the means of accessing the device and provides for
  35. either raw access to the entire LUN (referred to as direct
  36. or physical LUN access) or access to a kernel/AFU-mediated
  37. partition of the LUN (referred to as virtual LUN access). The
  38. segmentation of a disk device into virtual LUNs is assisted
  39. by special translation services provided by the Flash AFU.
  40. Overview
  41. ========
  42. The Coherent Accelerator Interface Architecture (CAIA) introduces a
  43. concept of a master context. A master typically has special privileges
  44. granted to it by the kernel or hypervisor allowing it to perform AFU
  45. wide management and control. The master may or may not be involved
  46. directly in each user I/O, but at the minimum is involved in the
  47. initial setup before the user application is allowed to send requests
  48. directly to the AFU.
  49. The CXL Flash Adapter Driver establishes a master context with the
  50. AFU. It uses memory mapped I/O (MMIO) for this control and setup. The
  51. Adapter Problem Space Memory Map looks like this:
  52. +-------------------------------+
  53. | 512 * 64 KB User MMIO |
  54. | (per context) |
  55. | User Accessible |
  56. +-------------------------------+
  57. | 512 * 128 B per context |
  58. | Provisioning and Control |
  59. | Trusted Process accessible |
  60. +-------------------------------+
  61. | 64 KB Global |
  62. | Trusted Process accessible |
  63. +-------------------------------+
  64. This driver configures itself into the SCSI software stack as an
  65. adapter driver. The driver is the only entity that is considered a
  66. Trusted Process to program the Provisioning and Control and Global
  67. areas in the MMIO Space shown above. The master context driver
  68. discovers all LUNs attached to the CXL Flash adapter and instantiates
  69. scsi block devices (/dev/sdb, /dev/sdc etc.) for each unique LUN
  70. seen from each path.
  71. Once these scsi block devices are instantiated, an application
  72. written to a specification provided by the block library may get
  73. access to the Flash from user space (without requiring a system call).
  74. This master context driver also provides a series of ioctls for this
  75. block library to enable this user space access. The driver supports
  76. two modes for accessing the block device.
  77. The first mode is called a virtual mode. In this mode a single scsi
  78. block device (/dev/sdb) may be carved up into any number of distinct
  79. virtual LUNs. The virtual LUNs may be resized as long as the sum of
  80. the sizes of all the virtual LUNs, along with the meta-data associated
  81. with it does not exceed the physical capacity.
  82. The second mode is called the physical mode. In this mode a single
  83. block device (/dev/sdb) may be opened directly by the block library
  84. and the entire space for the LUN is available to the application.
  85. Only the physical mode provides persistence of the data. i.e. The
  86. data written to the block device will survive application exit and
  87. restart and also reboot. The virtual LUNs do not persist (i.e. do
  88. not survive after the application terminates or the system reboots).
  89. Block library API
  90. =================
  91. Applications intending to get access to the CXL Flash from user
  92. space should use the block library, as it abstracts the details of
  93. interfacing directly with the cxlflash driver that are necessary for
  94. performing administrative actions (i.e.: setup, tear down, resize).
  95. The block library can be thought of as a 'user' of services,
  96. implemented as IOCTLs, that are provided by the cxlflash driver
  97. specifically for devices (LUNs) operating in user space access
  98. mode. While it is not a requirement that applications understand
  99. the interface between the block library and the cxlflash driver,
  100. a high-level overview of each supported service (IOCTL) is provided
  101. below.
  102. The block library can be found on GitHub:
  103. http://github.com/open-power/capiflash
  104. CXL Flash Driver IOCTLs
  105. =======================
  106. Users, such as the block library, that wish to interface with a flash
  107. device (LUN) via user space access need to use the services provided
  108. by the cxlflash driver. As these services are implemented as ioctls,
  109. a file descriptor handle must first be obtained in order to establish
  110. the communication channel between a user and the kernel. This file
  111. descriptor is obtained by opening the device special file associated
  112. with the scsi disk device (/dev/sdb) that was created during LUN
  113. discovery. As per the location of the cxlflash driver within the
  114. SCSI protocol stack, this open is actually not seen by the cxlflash
  115. driver. Upon successful open, the user receives a file descriptor
  116. (herein referred to as fd1) that should be used for issuing the
  117. subsequent ioctls listed below.
  118. The structure definitions for these IOCTLs are available in:
  119. uapi/scsi/cxlflash_ioctl.h
  120. DK_CXLFLASH_ATTACH
  121. ------------------
  122. This ioctl obtains, initializes, and starts a context using the CXL
  123. kernel services. These services specify a context id (u16) by which
  124. to uniquely identify the context and its allocated resources. The
  125. services additionally provide a second file descriptor (herein
  126. referred to as fd2) that is used by the block library to initiate
  127. memory mapped I/O (via mmap()) to the CXL flash device and poll for
  128. completion events. This file descriptor is intentionally installed by
  129. this driver and not the CXL kernel services to allow for intermediary
  130. notification and access in the event of a non-user-initiated close(),
  131. such as a killed process. This design point is described in further
  132. detail in the description for the DK_CXLFLASH_DETACH ioctl.
  133. There are a few important aspects regarding the "tokens" (context id
  134. and fd2) that are provided back to the user:
  135. - These tokens are only valid for the process under which they
  136. were created. The child of a forked process cannot continue
  137. to use the context id or file descriptor created by its parent
  138. (see DK_CXLFLASH_VLUN_CLONE for further details).
  139. - These tokens are only valid for the lifetime of the context and
  140. the process under which they were created. Once either is
  141. destroyed, the tokens are to be considered stale and subsequent
  142. usage will result in errors.
  143. - A valid adapter file descriptor (fd2 >= 0) is only returned on
  144. the initial attach for a context. Subsequent attaches to an
  145. existing context (DK_CXLFLASH_ATTACH_REUSE_CONTEXT flag present)
  146. do not provide the adapter file descriptor as it was previously
  147. made known to the application.
  148. - When a context is no longer needed, the user shall detach from
  149. the context via the DK_CXLFLASH_DETACH ioctl. When this ioctl
  150. returns with a valid adapter file descriptor and the return flag
  151. DK_CXLFLASH_APP_CLOSE_ADAP_FD is present, the application _must_
  152. close the adapter file descriptor following a successful detach.
  153. - When this ioctl returns with a valid fd2 and the return flag
  154. DK_CXLFLASH_APP_CLOSE_ADAP_FD is present, the application _must_
  155. close fd2 in the following circumstances:
  156. + Following a successful detach of the last user of the context
  157. + Following a successful recovery on the context's original fd2
  158. + In the child process of a fork(), following a clone ioctl,
  159. on the fd2 associated with the source context
  160. - At any time, a close on fd2 will invalidate the tokens. Applications
  161. should exercise caution to only close fd2 when appropriate (outlined
  162. in the previous bullet) to avoid premature loss of I/O.
  163. DK_CXLFLASH_USER_DIRECT
  164. -----------------------
  165. This ioctl is responsible for transitioning the LUN to direct
  166. (physical) mode access and configuring the AFU for direct access from
  167. user space on a per-context basis. Additionally, the block size and
  168. last logical block address (LBA) are returned to the user.
  169. As mentioned previously, when operating in user space access mode,
  170. LUNs may be accessed in whole or in part. Only one mode is allowed
  171. at a time and if one mode is active (outstanding references exist),
  172. requests to use the LUN in a different mode are denied.
  173. The AFU is configured for direct access from user space by adding an
  174. entry to the AFU's resource handle table. The index of the entry is
  175. treated as a resource handle that is returned to the user. The user
  176. is then able to use the handle to reference the LUN during I/O.
  177. DK_CXLFLASH_USER_VIRTUAL
  178. ------------------------
  179. This ioctl is responsible for transitioning the LUN to virtual mode
  180. of access and configuring the AFU for virtual access from user space
  181. on a per-context basis. Additionally, the block size and last logical
  182. block address (LBA) are returned to the user.
  183. As mentioned previously, when operating in user space access mode,
  184. LUNs may be accessed in whole or in part. Only one mode is allowed
  185. at a time and if one mode is active (outstanding references exist),
  186. requests to use the LUN in a different mode are denied.
  187. The AFU is configured for virtual access from user space by adding
  188. an entry to the AFU's resource handle table. The index of the entry
  189. is treated as a resource handle that is returned to the user. The
  190. user is then able to use the handle to reference the LUN during I/O.
  191. By default, the virtual LUN is created with a size of 0. The user
  192. would need to use the DK_CXLFLASH_VLUN_RESIZE ioctl to adjust the grow
  193. the virtual LUN to a desired size. To avoid having to perform this
  194. resize for the initial creation of the virtual LUN, the user has the
  195. option of specifying a size as part of the DK_CXLFLASH_USER_VIRTUAL
  196. ioctl, such that when success is returned to the user, the
  197. resource handle that is provided is already referencing provisioned
  198. storage. This is reflected by the last LBA being a non-zero value.
  199. DK_CXLFLASH_VLUN_RESIZE
  200. -----------------------
  201. This ioctl is responsible for resizing a previously created virtual
  202. LUN and will fail if invoked upon a LUN that is not in virtual
  203. mode. Upon success, an updated last LBA is returned to the user
  204. indicating the new size of the virtual LUN associated with the
  205. resource handle.
  206. The partitioning of virtual LUNs is jointly mediated by the cxlflash
  207. driver and the AFU. An allocation table is kept for each LUN that is
  208. operating in the virtual mode and used to program a LUN translation
  209. table that the AFU references when provided with a resource handle.
  210. DK_CXLFLASH_RELEASE
  211. -------------------
  212. This ioctl is responsible for releasing a previously obtained
  213. reference to either a physical or virtual LUN. This can be
  214. thought of as the inverse of the DK_CXLFLASH_USER_DIRECT or
  215. DK_CXLFLASH_USER_VIRTUAL ioctls. Upon success, the resource handle
  216. is no longer valid and the entry in the resource handle table is
  217. made available to be used again.
  218. As part of the release process for virtual LUNs, the virtual LUN
  219. is first resized to 0 to clear out and free the translation tables
  220. associated with the virtual LUN reference.
  221. DK_CXLFLASH_DETACH
  222. ------------------
  223. This ioctl is responsible for unregistering a context with the
  224. cxlflash driver and release outstanding resources that were
  225. not explicitly released via the DK_CXLFLASH_RELEASE ioctl. Upon
  226. success, all "tokens" which had been provided to the user from the
  227. DK_CXLFLASH_ATTACH onward are no longer valid.
  228. When the DK_CXLFLASH_APP_CLOSE_ADAP_FD flag was returned on a successful
  229. attach, the application _must_ close the fd2 associated with the context
  230. following the detach of the final user of the context.
  231. DK_CXLFLASH_VLUN_CLONE
  232. ----------------------
  233. This ioctl is responsible for cloning a previously created
  234. context to a more recently created context. It exists solely to
  235. support maintaining user space access to storage after a process
  236. forks. Upon success, the child process (which invoked the ioctl)
  237. will have access to the same LUNs via the same resource handle(s)
  238. as the parent, but under a different context.
  239. Context sharing across processes is not supported with CXL and
  240. therefore each fork must be met with establishing a new context
  241. for the child process. This ioctl simplifies the state management
  242. and playback required by a user in such a scenario. When a process
  243. forks, child process can clone the parents context by first creating
  244. a context (via DK_CXLFLASH_ATTACH) and then using this ioctl to
  245. perform the clone from the parent to the child.
  246. The clone itself is fairly simple. The resource handle and lun
  247. translation tables are copied from the parent context to the child's
  248. and then synced with the AFU.
  249. When the DK_CXLFLASH_APP_CLOSE_ADAP_FD flag was returned on a successful
  250. attach, the application _must_ close the fd2 associated with the source
  251. context (still resident/accessible in the parent process) following the
  252. clone. This is to avoid a stale entry in the file descriptor table of the
  253. child process.
  254. DK_CXLFLASH_VERIFY
  255. ------------------
  256. This ioctl is used to detect various changes such as the capacity of
  257. the disk changing, the number of LUNs visible changing, etc. In cases
  258. where the changes affect the application (such as a LUN resize), the
  259. cxlflash driver will report the changed state to the application.
  260. The user calls in when they want to validate that a LUN hasn't been
  261. changed in response to a check condition. As the user is operating out
  262. of band from the kernel, they will see these types of events without
  263. the kernel's knowledge. When encountered, the user's architected
  264. behavior is to call in to this ioctl, indicating what they want to
  265. verify and passing along any appropriate information. For now, only
  266. verifying a LUN change (ie: size different) with sense data is
  267. supported.
  268. DK_CXLFLASH_RECOVER_AFU
  269. -----------------------
  270. This ioctl is used to drive recovery (if such an action is warranted)
  271. of a specified user context. Any state associated with the user context
  272. is re-established upon successful recovery.
  273. User contexts are put into an error condition when the device needs to
  274. be reset or is terminating. Users are notified of this error condition
  275. by seeing all 0xF's on an MMIO read. Upon encountering this, the
  276. architected behavior for a user is to call into this ioctl to recover
  277. their context. A user may also call into this ioctl at any time to
  278. check if the device is operating normally. If a failure is returned
  279. from this ioctl, the user is expected to gracefully clean up their
  280. context via release/detach ioctls. Until they do, the context they
  281. hold is not relinquished. The user may also optionally exit the process
  282. at which time the context/resources they held will be freed as part of
  283. the release fop.
  284. When the DK_CXLFLASH_APP_CLOSE_ADAP_FD flag was returned on a successful
  285. attach, the application _must_ unmap and close the fd2 associated with the
  286. original context following this ioctl returning success and indicating that
  287. the context was recovered (DK_CXLFLASH_RECOVER_AFU_CONTEXT_RESET).
  288. DK_CXLFLASH_MANAGE_LUN
  289. ----------------------
  290. This ioctl is used to switch a LUN from a mode where it is available
  291. for file-system access (legacy), to a mode where it is set aside for
  292. exclusive user space access (superpipe). In case a LUN is visible
  293. across multiple ports and adapters, this ioctl is used to uniquely
  294. identify each LUN by its World Wide Node Name (WWNN).