.. | .. |
---|
20 | 20 | If you have a block device which supports DAX, you can make a filesystem |
---|
21 | 21 | on it as usual. The DAX code currently only supports files with a block |
---|
22 | 22 | size equal to your kernel's PAGE_SIZE, so you may need to specify a block |
---|
23 | | -size when creating the filesystem. When mounting it, use the "-o dax" |
---|
24 | | -option on the command line or add 'dax' to the options in /etc/fstab. |
---|
| 23 | +size when creating the filesystem. |
---|
| 24 | + |
---|
| 25 | +Currently 3 filesystems support DAX: ext2, ext4 and xfs. Enabling DAX on them |
---|
| 26 | +is different. |
---|
| 27 | + |
---|
| 28 | +Enabling DAX on ext2 |
---|
| 29 | +----------------------------- |
---|
| 30 | + |
---|
| 31 | +When mounting the filesystem, use the "-o dax" option on the command line or |
---|
| 32 | +add 'dax' to the options in /etc/fstab. This works to enable DAX on all files |
---|
| 33 | +within the filesystem. It is equivalent to the '-o dax=always' behavior below. |
---|
| 34 | + |
---|
| 35 | + |
---|
| 36 | +Enabling DAX on xfs and ext4 |
---|
| 37 | +---------------------------- |
---|
| 38 | + |
---|
| 39 | +Summary |
---|
| 40 | +------- |
---|
| 41 | + |
---|
| 42 | + 1. There exists an in-kernel file access mode flag S_DAX that corresponds to |
---|
| 43 | + the statx flag STATX_ATTR_DAX. See the manpage for statx(2) for details |
---|
| 44 | + about this access mode. |
---|
| 45 | + |
---|
| 46 | + 2. There exists a persistent flag FS_XFLAG_DAX that can be applied to regular |
---|
| 47 | + files and directories. This advisory flag can be set or cleared at any |
---|
| 48 | + time, but doing so does not immediately affect the S_DAX state. |
---|
| 49 | + |
---|
| 50 | + 3. If the persistent FS_XFLAG_DAX flag is set on a directory, this flag will |
---|
| 51 | + be inherited by all regular files and subdirectories that are subsequently |
---|
| 52 | + created in this directory. Files and subdirectories that exist at the time |
---|
| 53 | + this flag is set or cleared on the parent directory are not modified by |
---|
| 54 | + this modification of the parent directory. |
---|
| 55 | + |
---|
| 56 | + 4. There exist dax mount options which can override FS_XFLAG_DAX in the |
---|
| 57 | + setting of the S_DAX flag. Given underlying storage which supports DAX the |
---|
| 58 | + following hold: |
---|
| 59 | + |
---|
| 60 | + "-o dax=inode" means "follow FS_XFLAG_DAX" and is the default. |
---|
| 61 | + |
---|
| 62 | + "-o dax=never" means "never set S_DAX, ignore FS_XFLAG_DAX." |
---|
| 63 | + |
---|
| 64 | + "-o dax=always" means "always set S_DAX ignore FS_XFLAG_DAX." |
---|
| 65 | + |
---|
| 66 | + "-o dax" is a legacy option which is an alias for "dax=always". |
---|
| 67 | + This may be removed in the future so "-o dax=always" is |
---|
| 68 | + the preferred method for specifying this behavior. |
---|
| 69 | + |
---|
| 70 | + NOTE: Modifications to and the inheritance behavior of FS_XFLAG_DAX remain |
---|
| 71 | + the same even when the filesystem is mounted with a dax option. However, |
---|
| 72 | + in-core inode state (S_DAX) will be overridden until the filesystem is |
---|
| 73 | + remounted with dax=inode and the inode is evicted from kernel memory. |
---|
| 74 | + |
---|
| 75 | + 5. The S_DAX policy can be changed via: |
---|
| 76 | + |
---|
| 77 | + a) Setting the parent directory FS_XFLAG_DAX as needed before files are |
---|
| 78 | + created |
---|
| 79 | + |
---|
| 80 | + b) Setting the appropriate dax="foo" mount option |
---|
| 81 | + |
---|
| 82 | + c) Changing the FS_XFLAG_DAX flag on existing regular files and |
---|
| 83 | + directories. This has runtime constraints and limitations that are |
---|
| 84 | + described in 6) below. |
---|
| 85 | + |
---|
| 86 | + 6. When changing the S_DAX policy via toggling the persistent FS_XFLAG_DAX flag, |
---|
| 87 | + the change in behaviour for existing regular files may not occur |
---|
| 88 | + immediately. If the change must take effect immediately, the administrator |
---|
| 89 | + needs to: |
---|
| 90 | + |
---|
| 91 | + a) stop the application so there are no active references to the data set |
---|
| 92 | + the policy change will affect |
---|
| 93 | + |
---|
| 94 | + b) evict the data set from kernel caches so it will be re-instantiated when |
---|
| 95 | + the application is restarted. This can be achieved by: |
---|
| 96 | + |
---|
| 97 | + i. drop-caches |
---|
| 98 | + ii. a filesystem unmount and mount cycle |
---|
| 99 | + iii. a system reboot |
---|
| 100 | + |
---|
| 101 | + |
---|
| 102 | +Details |
---|
| 103 | +------- |
---|
| 104 | + |
---|
| 105 | +There are 2 per-file dax flags. One is a persistent inode setting (FS_XFLAG_DAX) |
---|
| 106 | +and the other is a volatile flag indicating the active state of the feature |
---|
| 107 | +(S_DAX). |
---|
| 108 | + |
---|
| 109 | +FS_XFLAG_DAX is preserved within the filesystem. This persistent config |
---|
| 110 | +setting can be set, cleared and/or queried using the FS_IOC_FS[GS]ETXATTR ioctl |
---|
| 111 | +(see ioctl_xfs_fsgetxattr(2)) or an utility such as 'xfs_io'. |
---|
| 112 | + |
---|
| 113 | +New files and directories automatically inherit FS_XFLAG_DAX from |
---|
| 114 | +their parent directory _when_ _created_. Therefore, setting FS_XFLAG_DAX at |
---|
| 115 | +directory creation time can be used to set a default behavior for an entire |
---|
| 116 | +sub-tree. |
---|
| 117 | + |
---|
| 118 | +To clarify inheritance, here are 3 examples: |
---|
| 119 | + |
---|
| 120 | +Example A: |
---|
| 121 | + |
---|
| 122 | +mkdir -p a/b/c |
---|
| 123 | +xfs_io -c 'chattr +x' a |
---|
| 124 | +mkdir a/b/c/d |
---|
| 125 | +mkdir a/e |
---|
| 126 | + |
---|
| 127 | + dax: a,e |
---|
| 128 | + no dax: b,c,d |
---|
| 129 | + |
---|
| 130 | +Example B: |
---|
| 131 | + |
---|
| 132 | +mkdir a |
---|
| 133 | +xfs_io -c 'chattr +x' a |
---|
| 134 | +mkdir -p a/b/c/d |
---|
| 135 | + |
---|
| 136 | + dax: a,b,c,d |
---|
| 137 | + no dax: |
---|
| 138 | + |
---|
| 139 | +Example C: |
---|
| 140 | + |
---|
| 141 | +mkdir -p a/b/c |
---|
| 142 | +xfs_io -c 'chattr +x' c |
---|
| 143 | +mkdir a/b/c/d |
---|
| 144 | + |
---|
| 145 | + dax: c,d |
---|
| 146 | + no dax: a,b |
---|
| 147 | + |
---|
| 148 | + |
---|
| 149 | +The current enabled state (S_DAX) is set when a file inode is instantiated in |
---|
| 150 | +memory by the kernel. It is set based on the underlying media support, the |
---|
| 151 | +value of FS_XFLAG_DAX and the filesystem's dax mount option. |
---|
| 152 | + |
---|
| 153 | +statx can be used to query S_DAX. NOTE that only regular files will ever have |
---|
| 154 | +S_DAX set and therefore statx will never indicate that S_DAX is set on |
---|
| 155 | +directories. |
---|
| 156 | + |
---|
| 157 | +Setting the FS_XFLAG_DAX flag (specifically or through inheritance) occurs even |
---|
| 158 | +if the underlying media does not support dax and/or the filesystem is |
---|
| 159 | +overridden with a mount option. |
---|
| 160 | + |
---|
25 | 161 | |
---|
26 | 162 | |
---|
27 | 163 | Implementation Tips for Block Driver Writers |
---|
.. | .. |
---|
74 | 210 | exposure of uninitialized data through mmap. |
---|
75 | 211 | |
---|
76 | 212 | These filesystems may be used for inspiration: |
---|
77 | | -- ext2: see Documentation/filesystems/ext2.txt |
---|
78 | | -- ext4: see Documentation/filesystems/ext4.txt |
---|
79 | | -- xfs: see Documentation/filesystems/xfs.txt |
---|
| 213 | +- ext2: see Documentation/filesystems/ext2.rst |
---|
| 214 | +- ext4: see Documentation/filesystems/ext4/ |
---|
| 215 | +- xfs: see Documentation/admin-guide/xfs.rst |
---|
80 | 216 | |
---|
81 | 217 | |
---|
82 | 218 | Handling Media Errors |
---|
.. | .. |
---|
94 | 230 | redundancy in the following ways: |
---|
95 | 231 | |
---|
96 | 232 | 1. Delete the affected file, and restore from a backup (sysadmin route): |
---|
97 | | - This will free the file system blocks that were being used by the file, |
---|
| 233 | + This will free the filesystem blocks that were being used by the file, |
---|
98 | 234 | and the next time they're allocated, they will be zeroed first, which |
---|
99 | 235 | happens through the driver, and will clear bad sectors. |
---|
100 | 236 | |
---|