-
Notifications
You must be signed in to change notification settings - Fork 6
Tracking discussion for CGetAddr vs CToPtr vs as integer alias vs CGetLow
Just to write it down somewhere after https://github.com/CTSRD-CHERI/cheri-specification/pull/7 became "just C{Set,Get}High" (which is surely the correct course of action given the obvious utility of those opcodes and this much more subtle discussion). See also earlier discussion at https://github.com/CTSRD-CHERI/cheri-architecture/pull/78#discussion_r719528101 and discussions in Slack and even in person.
This is pointedly a wiki page and not an issue, so please do feel free to edit it with comments and additional thoughts.
We have many ways of getting something address-like out of a capability:
- merged register files allow for "as-integer accesses" to a capability's lower word, which happens to be the address field in our existing encodings,
-
CGetAddr
does what it says on the tin even in the case of a split register file, though the result is an absolute address and will not work with integer-authority memory instructions in the event that DDC.address is not zero, and -
CToPtr
is intended to give an address-like thing that is compatible with the current DDC, explicitly so that it works with integer-authority memory instructions.
Excepting the as-integer alias access behavior, these instructions have setter duals:
-
CSetAddr
preserves the tag (and non-address fields) so long as the result is representable; otherwise, it will clear the tag of the result. -
CFromPtr
is likeCSetAddr
with the exceptions that-
CFromPtr cd, cs, rs
withrs
holding0
is always NULL, whileCSetAddr cd, cs, rs
may generate a tagged result if0
is in the representable region ofcs
. -
CFromPtr cd, cnull, rs
generates a DDC-derived capability with offsetrs
, whileCSetAddr cd, cnull, rs
generates a NULL-derived capability with offset rs.
-
Right now, CHERI-RISC-V...
- is a merged register file
- presumes a 64-bit AS and has no TBI analogue, and so the entire bottom word of a capability is the 64-bit address field.
As such, the existing
CGetAddr cd, cs
is equivalent toCSetAddr cd, cnull, rs
, wherers
is the integer-subregister alias ofcs
.
Also right now, C casts between pointers and integers are lowered by...
- in hybrid, using
CToPtr
andCFromPtr
- in purecap, using
CGetAddr
andCIncOffset
withcnull
source. In both cases,CGetAddr
andCSetAddr
are used when performing arithmetic on a capability address yielding a capability result. This lowering is compatible with split register files, as it does not exploit the as-integer aliases. (Playing in compiler-explorer: https://cheri-compiler-explorer.cl.cam.ac.uk/z/WsoY17 )
And now let's pry things apart:
-
If we split the register file, then as-integer operations are no longer possible, so an explicit opcode is needed, but at the moment
CGetAddr
can serve that role fine. -
If we have a TBI-like scheme that is common between integer pointers and capability pointers, such as MTE/SSM/ADI flavored techniques, then
CToPtr
andCGetAddr
should pass these bits through for use by the integer-authority memory instructions.CGetAddr
is then a bit of a misnomer, but so it goes. In particular, conventional architectures expose pointers as two-to-some-power-of-two-sized integers because that's the only thing that you can store with natural alignment. In a CHERI (and TIB) system, the address is no longer the whole pointer and so we have no requirement that the address is even an integer multiple of bytes. Most systems with virtual memory have less than a 64-bit address space, with 48 and 59 bits being common sizes. We could have more efficient capability encodings if we were to restrict the size of the address field to the virtual address-space size, which would also make non-canonical addresses impossible to create by construction. -
If we instead explicitly use a smaller address space and reclaim bits in the capability encoding for things beyond the ken of integer pointers (say, additional permission bits or an architecture without non-CHERI MTE), then, I think,
CToPtr
andCGetAddr
should ignore these bits and sign-extend the narrower address space projection to a full machine word.In principle, in this case, as-integer aliasing becomes the only way to see these additional bits in registers and there is no setter dual to manipulate them as data within the larger capability structure in a register without imposing the requirement that it be NULL-derived. That is, updates would have to go via memory. In a split register file, both accesses and updates would have to be via memory (reminiscent of capability upper words before
C{Get,Set}High
).
It is for this kind of an eventuality that I would like to have C{Get,Set}Low
reserved. On merged register machines, CGetLow rd, cs
is the same as mov rd, rs
(with rs
aliasing cs
; CSetLow cd, cnull, rs
produces the same result as well). On split register machines, CGetLow
has independent meaning.
There is a challenge around C lowering in purecap mode:
- the current lowering, of
CGetAddr rd, cs
andCIncOffset cd, cnull, rs
could no longer hold full-width constants (that are not sign extensions of smaller constants). - even moving to a
CGetLow
/CSetLow
lowering is likely to be confusing if the address field is not contiguous and inclusive of the least significant bit of the lower word, as then(int)((intcap_t)p & 0xFF)
may not be in[0, 0xFF]
: the&
usesCGetAddr
/CSetAddr
while the cast would useCGetLow
. Perhaps at this point we throw in the towel on the full opacity of the capability encoding and require thatCGetAddr
andCGetLow
can differ only by the former sign-extending some truncation of the latter.