Tagged RISC-V

Version 0.1-draft-20230529

Table of Contents
Introduction
Processor State
Tags
Synergy With Memory Encryption
Clique Usage
Virtual memory
CHERI
Open Issues

Introduction

Tagged RISC‑V is a parallel universe in the RISC‑V multiverse. Its purpose is to support enhanced security. It is based on a 72‑bit memory doubleword that supports 64 bits of instruction and data, just like the original RISC‑V 64‑bit universe, but with an 8‑bit tag for memory doublewords that is used for two purposes. The primary purpose of the tags is to support probabilistic bounds checking with conventional 64‑bit pointers in production code. The goal is not to catch every possible error, but to make the chances of undetected errors much smaller than in an untagged architecture. This is accomplished with cliques. The clique field of a pointer used to access memory must agree with the clique stored in memory for that pointer. A clique mismatch results in an exception that indicates that either the pointer was used to access beyond the memory allocated for the pointer, or was used to access memory after an explicit deallocation. There are performance costs to Tagged RISC‑V but these are expected to be acceptable in return for the safety achieved. The largest issue is one of main memory width (but see Synergy With Memory Encryption below).

A secondary goal of tagging is to support the University of Cambridge Capability Hardware Enhanced RISC Instructions (CHERI) programming model, in particular CHERI‑128 capabilities, which require a 1‑bit tag [PDF] . Instead of the CHERI 1‑bit tag, Tagged RISC‑V reserves two tag values for CHERI, one for the least significant 64 bits of a CHERI‑128 capability, and another for the most significant bits.

Adding 12.5% overhead to main memory is heavy lift. However the continued inability of the software industry to address security issues that arise from undisciplined languages such as C++ would seem to necessitate this drastic step.

Processor State

Tagged RISC‑V general-purpose x registers are widened to 128 bits to accommodate CHERI‑128 capabilities. Otherwise, the unprivileged state is identical to RVI64. New LC (Load CHERI) and SC (Store CHERI) instructions would be introduced to load and store the full 128‑bit values with tag checking.

As introduced above, an 8‑bit tag is added to main-memory 64‑bit aligned memory location as shown below:

Cliqued Data Stored in Memory
71	64	63	0
mc		data
8		64

Field	Width	Bits	Description
data	64	63:0	The RISC‑V memory contents
mc	8	71:64	Clique of memory storing data, which must match bits 63:56 of the pointer used to access it.

Misaligned loads and stores are supported. For misaligned accesses that cross an 8‑byte boundary, the clique of both memory locations accessed is checked.

A new Physical Memory Attribute (PMA) indicates whether memory is tagged or not. Untagged (64‑bit) memory is equivalent to having a tag of 252 for clique checking purposes.

Tag Value	Usage
0..251	Clique ids for tagged memory
252	Clique id for untagged memory
253	Reserved
254	CHERI doubleword 0 tag
255	CHERI doubleword 1 tag

Synergy With Memory Encryption

Tags, whether 8‑bit (as proposed here) or 1‑bit (as in pure CHERI) are a heavy lift given standard non-ECC DRAM modules, but there is also a movement to add data at rest encryption to main memory, and this has the potential to make Tagged RISC‑V work with standard DRAM modules. In a 64‑byte cache line, there would also be eight 8‑bit tags, for a total of 576 bits. Consider the 576 bits to be 9 words of 64 bits, use a standard 128‑bit block cipher (e.g. standard AES) nine times, add the 128‑bit authentication, resulting in 704 bits, or eight words of 88 bits. Adding 8 bits of ECC results in 96 bits per memory word, which might use three 32‑bit or 64‑bit DRAM modules. Reads of encrypted memory would compute the 576 GCM xor bits during the read latency, resulting in a single xor when the data arrives at the system interconnect port boundary (either 96, 192, 384, or 576 bits per cycle). This xor would be much less time than the ECC check. Regenerating ECC for the decrypted data for writing into the L2 cache can be done by also precomputing the 64 bits to xor with the 8 ECC codes. Only if an ECC error is detected and corrected is it necessary to recompute the ECC before writing into the L2 cache. Writes would incur the GCM computation latency (primarily nine AES computations). Because the memory width and interconnect fabric would be sized for encryption, the only point in not encrypting a region would be to reduce write latency or to support non-block writes (it being impossible to update the authentication code without doing a read, modify, write).

Clique Usage

Memory and pointer cliques allow accesses to be bounds checked even in undisciplined languages such as C++, where compiler-generated bounds checking is not always possible. When memory is allocated, the tags of the memory locations is filled in with a clique identifier, chosen to be different to nearby allocations, and the pointer to that memory is given the same clique. When the pointer is used as in a load or store instruction, the memory tag is compared to clique in the pointer and if it is the same, then the access succeeds. If the pointer and memory cliques differ, the access takes an exception. For example, if the pointer is used to access beyond the memory allocated when the pointer was created, typically a new clique id would be seen indicating an out of bounds situation. When memory is explicitly deallocated, the tags in memory words would be changed so that pointers to the deallocated

There are multiple ways this mechanism might be used, but one way a slab allocator might work is by setting the tags of each N words of the slab to incrementing values mod 252, and then incrementing the tags in the words of a freed block by 16 or 32 mod 252. A non-slab allocator might choose clique values randomly, but with an additional test that the value chosen is not in either adjacent block.

It is also desirable that stack frame allocation write clique values into the new frame and that the stack pointer reflect this new clique value in bits 63:56. A new ALLOC sp, sp, framesize instruction might be created for this purpose, where the clique number comes from incrementing an unprivileged CSR, except that it would increment twice if it matched the source sp value. See the discussion of a possible STM instruction above, as a vector-like unaligned tag write would be needed to keep code size acceptable. In addition, arrays within the stack frame would need to be given different clique values from the rest of the frame. This might be accomplished by using the ALLOC instruction multiple times when arrays are present. Finally a separate clique value could be used to protect the return address to prevent Return-Oriented Programming [wikilink] . This area needs further exploration. It may obviate the need for a shadow stack mechanism.

Virtual memory

Tagged RISC‑V employs a 48‑bit virtual address space, with bits 47:0 being the virtual address. Pointer bits 63:56 are used as the clique for checking memory accesses. Bits 55:48 are reserved, and may be used for address space expansion (e.g. as in Ssv64) or for Garbage Collection (GC) support.

Cliqued Pointer Stored in Register
63	56	55	48	47	0
ac		reserved		address
8		8		48

Cliqued Pointer Stored in Memory
71	64	63	56	55	48	47	0
mc		ac		reserved		address
8		8		8		48

Tagged RISC‑V supports satp.mode of Sv39 and Sv48. In addition, we propose a a new satp.mode value that employs Ssv64 page table formats and separates the user-mode and supervisor-mode page tables, using bit 55 to pick either satp or a new ssatp as the page table address. This extends the user-mode address space by one bit, and may avoid issues that come up with sign-extension in the RISC‑V universe.

CHERI

The RVI64 load instructions (e.g. LB, LH, LW, and LD) set bits 127:64 of the destination x register to 0, which marks the register as not containing a CHERI‑128 capability (perhaps a single bit of the P field?). When used as base register for a load or store instruction, such values indicate that cliqued checking is employed. When the x register does contain a CHERI‑128 capability, CHERI access rules apply. Whether CHERI accesses check memory tags against the ac field is TBD.

The CHERI‑128 format is changed to reduce the address bits from 64 to 56 and expand the top and bottom values as shown below:

Doubleword 0 of CHERI capability in memory
71	64	63	56	55	48	47	0
254		ac		0		byteaddress
8		8		8		48

Field	Width	Bits	Description
byteaddress	48	47:0	Address of addressed memory
0	8	55:48	Reserved
ac	8	63:56	Clique of addressed memory (TBD whether used by CHERI or not)
254	8	71:64	CHERI doubleword 0 tag

Doubleword 1 of CHERI capability in memory
71	64	63	49	48	42	41	40	39	24	23	21	20	3	2	0
255		P		0		Z	L	T		TE		B		BE
8		15		7		1	1	16		3		18		3

Field	Width	Bits	Description
BE	3	2:0	Bottom or Exponent bits
B	18	20:3	Bottom bits
TE	3	23:21	Top or Exponent bits
T	16	39:24	Top bits
L	1	40	Length bit 19 or Sealed
Z	1	41	Internal Exponent flag
P	15	63:49	Permissions
255	8	71:64	CHERI doubleword 0 tag

See CHERI Concentrate [PDF] for details.

The LC (Load CHERI) instruction takes an exception if the memory tag in doubleword 0 is not 254 or if the memory tag in doubleword 1 is not 255. The SC (Store CHERI) instruction writes the memory tag of doubleword 0 to 254 and of doubleword 1 to 255. This makes these locations somewhat inaccessible to other load and store instructions since these tags would never be returned by an allocator. A pointer with bits 63:56 with these tag values could be created, which would allow the CHERI‑128 fields to be read. Outside of machine mode, stores to memory locations with CHERI tags would take an exception to prevent tampering (an alternative would be to change the tag to a non-CHERI tag).

Because the x registers support 128‑bit load/store for LC and SC, it might be desirable appropriate to add instructions that load/store this width for non-capability data. This becomes problematic without a register tag, and so is not proposed here.

Open Issues

Is there a need for instructions that do block copies that preserve memory tag values (e.g. for copy-on-write pages)? What are the CHERI implications here?
How to context switch x registers without knowing whether they contain CHERI capabilities or not? This could be solved by tagging the x registers, but this proposal has so far avoided that.

		<webmaster at securerisc.org>
2023-05-29