===============================================
|
Memory Tagging Extension (MTE) in AArch64 Linux
|
===============================================
|
|
Authors: Vincenzo Frascino <vincenzo.frascino@arm.com>
|
Catalin Marinas <catalin.marinas@arm.com>
|
|
Date: 2020-02-25
|
|
This document describes the provision of the Memory Tagging Extension
|
functionality in AArch64 Linux.
|
|
Introduction
|
============
|
|
ARMv8.5 based processors introduce the Memory Tagging Extension (MTE)
|
feature. MTE is built on top of the ARMv8.0 virtual address tagging TBI
|
(Top Byte Ignore) feature and allows software to access a 4-bit
|
allocation tag for each 16-byte granule in the physical address space.
|
Such memory range must be mapped with the Normal-Tagged memory
|
attribute. A logical tag is derived from bits 59-56 of the virtual
|
address used for the memory access. A CPU with MTE enabled will compare
|
the logical tag against the allocation tag and potentially raise an
|
exception on mismatch, subject to system registers configuration.
|
|
Userspace Support
|
=================
|
|
When ``CONFIG_ARM64_MTE`` is selected and Memory Tagging Extension is
|
supported by the hardware, the kernel advertises the feature to
|
userspace via ``HWCAP2_MTE``.
|
|
PROT_MTE
|
--------
|
|
To access the allocation tags, a user process must enable the Tagged
|
memory attribute on an address range using a new ``prot`` flag for
|
``mmap()`` and ``mprotect()``:
|
|
``PROT_MTE`` - Pages allow access to the MTE allocation tags.
|
|
The allocation tag is set to 0 when such pages are first mapped in the
|
user address space and preserved on copy-on-write. ``MAP_SHARED`` is
|
supported and the allocation tags can be shared between processes.
|
|
**Note**: ``PROT_MTE`` is only supported on ``MAP_ANONYMOUS`` and
|
RAM-based file mappings (``tmpfs``, ``memfd``). Passing it to other
|
types of mapping will result in ``-EINVAL`` returned by these system
|
calls.
|
|
**Note**: The ``PROT_MTE`` flag (and corresponding memory type) cannot
|
be cleared by ``mprotect()``.
|
|
**Note**: ``madvise()`` memory ranges with ``MADV_DONTNEED`` and
|
``MADV_FREE`` may have the allocation tags cleared (set to 0) at any
|
point after the system call.
|
|
Tag Check Faults
|
----------------
|
|
When ``PROT_MTE`` is enabled on an address range and a mismatch between
|
the logical and allocation tags occurs on access, there are three
|
configurable behaviours:
|
|
- *Ignore* - This is the default mode. The CPU (and kernel) ignores the
|
tag check fault.
|
|
- *Synchronous* - The kernel raises a ``SIGSEGV`` synchronously, with
|
``.si_code = SEGV_MTESERR`` and ``.si_addr = <fault-address>``. The
|
memory access is not performed. If ``SIGSEGV`` is ignored or blocked
|
by the offending thread, the containing process is terminated with a
|
``coredump``.
|
|
- *Asynchronous* - The kernel raises a ``SIGSEGV``, in the offending
|
thread, asynchronously following one or multiple tag check faults,
|
with ``.si_code = SEGV_MTEAERR`` and ``.si_addr = 0`` (the faulting
|
address is unknown).
|
|
The user can select the above modes, per thread, using the
|
``prctl(PR_SET_TAGGED_ADDR_CTRL, flags, 0, 0, 0)`` system call where ``flags``
|
contains any number of the following values in the ``PR_MTE_TCF_MASK``
|
bit-field:
|
|
- ``PR_MTE_TCF_NONE`` - *Ignore* tag check faults
|
(ignored if combined with other options)
|
- ``PR_MTE_TCF_SYNC`` - *Synchronous* tag check fault mode
|
- ``PR_MTE_TCF_ASYNC`` - *Asynchronous* tag check fault mode
|
|
If no modes are specified, tag check faults are ignored. If a single
|
mode is specified, the program will run in that mode. If multiple
|
modes are specified, the mode is selected as described in the "Per-CPU
|
preferred tag checking modes" section below.
|
|
The current tag check fault mode can be read using the
|
``prctl(PR_GET_TAGGED_ADDR_CTRL, 0, 0, 0, 0)`` system call.
|
|
Tag checking can also be disabled for a user thread by setting the
|
``PSTATE.TCO`` bit with ``MSR TCO, #1``.
|
|
**Note**: Signal handlers are always invoked with ``PSTATE.TCO = 0``,
|
irrespective of the interrupted context. ``PSTATE.TCO`` is restored on
|
``sigreturn()``.
|
|
**Note**: There are no *match-all* logical tags available for user
|
applications.
|
|
**Note**: Kernel accesses to the user address space (e.g. ``read()``
|
system call) are not checked if the user thread tag checking mode is
|
``PR_MTE_TCF_NONE`` or ``PR_MTE_TCF_ASYNC``. If the tag checking mode is
|
``PR_MTE_TCF_SYNC``, the kernel makes a best effort to check its user
|
address accesses, however it cannot always guarantee it. Kernel accesses
|
to user addresses are always performed with an effective ``PSTATE.TCO``
|
value of zero, regardless of the user configuration.
|
|
Excluding Tags in the ``IRG``, ``ADDG`` and ``SUBG`` instructions
|
-----------------------------------------------------------------
|
|
The architecture allows excluding certain tags to be randomly generated
|
via the ``GCR_EL1.Exclude`` register bit-field. By default, Linux
|
excludes all tags other than 0. A user thread can enable specific tags
|
in the randomly generated set using the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
|
flags, 0, 0, 0)`` system call where ``flags`` contains the tags bitmap
|
in the ``PR_MTE_TAG_MASK`` bit-field.
|
|
**Note**: The hardware uses an exclude mask but the ``prctl()``
|
interface provides an include mask. An include mask of ``0`` (exclusion
|
mask ``0xffff``) results in the CPU always generating tag ``0``.
|
|
Per-CPU preferred tag checking mode
|
-----------------------------------
|
|
On some CPUs the performance of MTE in stricter tag checking modes
|
is similar to that of less strict tag checking modes. This makes it
|
worthwhile to enable stricter checks on those CPUs when a less strict
|
checking mode is requested, in order to gain the error detection
|
benefits of the stricter checks without the performance downsides. To
|
support this scenario, a privileged user may configure a stricter
|
tag checking mode as the CPU's preferred tag checking mode.
|
|
The preferred tag checking mode for each CPU is controlled by
|
``/sys/devices/system/cpu/cpu<N>/mte_tcf_preferred``, to which a
|
privileged user may write the value ``async`` or ``sync``. The default
|
preferred mode for each CPU is ``async``.
|
|
To allow a program to potentially run in the CPU's preferred tag
|
checking mode, the user program may set multiple tag check fault mode
|
bits in the ``flags`` argument to the ``prctl(PR_SET_TAGGED_ADDR_CTRL,
|
flags, 0, 0, 0)`` system call. If the CPU's preferred tag checking
|
mode is in the task's set of provided tag checking modes (this will
|
always be the case at present because the kernel only supports two
|
tag checking modes, but future kernels may support more modes), that
|
mode will be selected. Otherwise, one of the modes in the task's mode
|
set will be selected in a currently unspecified manner.
|
|
Initial process state
|
---------------------
|
|
On ``execve()``, the new process has the following configuration:
|
|
- ``PR_TAGGED_ADDR_ENABLE`` set to 0 (disabled)
|
- No tag checking modes are selected (tag check faults ignored)
|
- ``PR_MTE_TAG_MASK`` set to 0 (all tags excluded)
|
- ``PSTATE.TCO`` set to 0
|
- ``PROT_MTE`` not set on any of the initial memory maps
|
|
On ``fork()``, the new process inherits the parent's configuration and
|
memory map attributes with the exception of the ``madvise()`` ranges
|
with ``MADV_WIPEONFORK`` which will have the data and tags cleared (set
|
to 0).
|
|
The ``ptrace()`` interface
|
--------------------------
|
|
``PTRACE_PEEKMTETAGS`` and ``PTRACE_POKEMTETAGS`` allow a tracer to read
|
the tags from or set the tags to a tracee's address space. The
|
``ptrace()`` system call is invoked as ``ptrace(request, pid, addr,
|
data)`` where:
|
|
- ``request`` - one of ``PTRACE_PEEKMTETAGS`` or ``PTRACE_POKEMTETAGS``.
|
- ``pid`` - the tracee's PID.
|
- ``addr`` - address in the tracee's address space.
|
- ``data`` - pointer to a ``struct iovec`` where ``iov_base`` points to
|
a buffer of ``iov_len`` length in the tracer's address space.
|
|
The tags in the tracer's ``iov_base`` buffer are represented as one
|
4-bit tag per byte and correspond to a 16-byte MTE tag granule in the
|
tracee's address space.
|
|
**Note**: If ``addr`` is not aligned to a 16-byte granule, the kernel
|
will use the corresponding aligned address.
|
|
``ptrace()`` return value:
|
|
- 0 - tags were copied, the tracer's ``iov_len`` was updated to the
|
number of tags transferred. This may be smaller than the requested
|
``iov_len`` if the requested address range in the tracee's or the
|
tracer's space cannot be accessed or does not have valid tags.
|
- ``-EPERM`` - the specified process cannot be traced.
|
- ``-EIO`` - the tracee's address range cannot be accessed (e.g. invalid
|
address) and no tags copied. ``iov_len`` not updated.
|
- ``-EFAULT`` - fault on accessing the tracer's memory (``struct iovec``
|
or ``iov_base`` buffer) and no tags copied. ``iov_len`` not updated.
|
- ``-EOPNOTSUPP`` - the tracee's address does not have valid tags (never
|
mapped with the ``PROT_MTE`` flag). ``iov_len`` not updated.
|
|
**Note**: There are no transient errors for the requests above, so user
|
programs should not retry in case of a non-zero system call return.
|
|
``PTRACE_GETREGSET`` and ``PTRACE_SETREGSET`` with ``addr ==
|
``NT_ARM_TAGGED_ADDR_CTRL`` allow ``ptrace()`` access to the tagged
|
address ABI control and MTE configuration of a process as per the
|
``prctl()`` options described in
|
Documentation/arm64/tagged-address-abi.rst and above. The corresponding
|
``regset`` is 1 element of 8 bytes (``sizeof(long))``).
|
|
Example of correct usage
|
========================
|
|
*MTE Example code*
|
|
.. code-block:: c
|
|
/*
|
* To be compiled with -march=armv8.5-a+memtag
|
*/
|
#include <errno.h>
|
#include <stdint.h>
|
#include <stdio.h>
|
#include <stdlib.h>
|
#include <unistd.h>
|
#include <sys/auxv.h>
|
#include <sys/mman.h>
|
#include <sys/prctl.h>
|
|
/*
|
* From arch/arm64/include/uapi/asm/hwcap.h
|
*/
|
#define HWCAP2_MTE (1 << 18)
|
|
/*
|
* From arch/arm64/include/uapi/asm/mman.h
|
*/
|
#define PROT_MTE 0x20
|
|
/*
|
* From include/uapi/linux/prctl.h
|
*/
|
#define PR_SET_TAGGED_ADDR_CTRL 55
|
#define PR_GET_TAGGED_ADDR_CTRL 56
|
# define PR_TAGGED_ADDR_ENABLE (1UL << 0)
|
# define PR_MTE_TCF_SHIFT 1
|
# define PR_MTE_TCF_NONE (0UL << PR_MTE_TCF_SHIFT)
|
# define PR_MTE_TCF_SYNC (1UL << PR_MTE_TCF_SHIFT)
|
# define PR_MTE_TCF_ASYNC (2UL << PR_MTE_TCF_SHIFT)
|
# define PR_MTE_TCF_MASK (3UL << PR_MTE_TCF_SHIFT)
|
# define PR_MTE_TAG_SHIFT 3
|
# define PR_MTE_TAG_MASK (0xffffUL << PR_MTE_TAG_SHIFT)
|
|
/*
|
* Insert a random logical tag into the given pointer.
|
*/
|
#define insert_random_tag(ptr) ({ \
|
uint64_t __val; \
|
asm("irg %0, %1" : "=r" (__val) : "r" (ptr)); \
|
__val; \
|
})
|
|
/*
|
* Set the allocation tag on the destination address.
|
*/
|
#define set_tag(tagged_addr) do { \
|
asm volatile("stg %0, [%0]" : : "r" (tagged_addr) : "memory"); \
|
} while (0)
|
|
int main()
|
{
|
unsigned char *a;
|
unsigned long page_sz = sysconf(_SC_PAGESIZE);
|
unsigned long hwcap2 = getauxval(AT_HWCAP2);
|
|
/* check if MTE is present */
|
if (!(hwcap2 & HWCAP2_MTE))
|
return EXIT_FAILURE;
|
|
/*
|
* Enable the tagged address ABI, synchronous or asynchronous MTE
|
* tag check faults (based on per-CPU preference) and allow all
|
* non-zero tags in the randomly generated set.
|
*/
|
if (prctl(PR_SET_TAGGED_ADDR_CTRL,
|
PR_TAGGED_ADDR_ENABLE | PR_MTE_TCF_SYNC | PR_MTE_TCF_ASYNC |
|
(0xfffe << PR_MTE_TAG_SHIFT),
|
0, 0, 0)) {
|
perror("prctl() failed");
|
return EXIT_FAILURE;
|
}
|
|
a = mmap(0, page_sz, PROT_READ | PROT_WRITE,
|
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
|
if (a == MAP_FAILED) {
|
perror("mmap() failed");
|
return EXIT_FAILURE;
|
}
|
|
/*
|
* Enable MTE on the above anonymous mmap. The flag could be passed
|
* directly to mmap() and skip this step.
|
*/
|
if (mprotect(a, page_sz, PROT_READ | PROT_WRITE | PROT_MTE)) {
|
perror("mprotect() failed");
|
return EXIT_FAILURE;
|
}
|
|
/* access with the default tag (0) */
|
a[0] = 1;
|
a[1] = 2;
|
|
printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
|
|
/* set the logical and allocation tags */
|
a = (unsigned char *)insert_random_tag(a);
|
set_tag(a);
|
|
printf("%p\n", a);
|
|
/* non-zero tag access */
|
a[0] = 3;
|
printf("a[0] = %hhu a[1] = %hhu\n", a[0], a[1]);
|
|
/*
|
* If MTE is enabled correctly the next instruction will generate an
|
* exception.
|
*/
|
printf("Expecting SIGSEGV...\n");
|
a[16] = 0xdd;
|
|
/* this should not be printed in the PR_MTE_TCF_SYNC mode */
|
printf("...haven't got one\n");
|
|
return EXIT_FAILURE;
|
}
|