Skip to content

Tracking discussion for CGetAddr vs CToPtr vs as integer alias vs CGetLow

David Chisnall edited this page Aug 23, 2022 · 2 revisions

Just to write it down somewhere after https://github.com/CTSRD-CHERI/cheri-specification/pull/7 became "just C{Set,Get}High" (which is surely the correct course of action given the obvious utility of those opcodes and this much more subtle discussion). See also earlier discussion at https://github.com/CTSRD-CHERI/cheri-architecture/pull/78#discussion_r719528101 and discussions in Slack and even in person.

This is pointedly a wiki page and not an issue, so please do feel free to edit it with comments and additional thoughts.

We have many ways of getting something address-like out of a capability:

  • merged register files allow for "as-integer accesses" to a capability's lower word, which happens to be the address field in our existing encodings,
  • CGetAddr does what it says on the tin even in the case of a split register file, though the result is an absolute address and will not work with integer-authority memory instructions in the event that DDC.address is not zero, and
  • CToPtr is intended to give an address-like thing that is compatible with the current DDC, explicitly so that it works with integer-authority memory instructions.

Excepting the as-integer alias access behavior, these instructions have setter duals:

  • CSetAddr preserves the tag (and non-address fields) so long as the result is representable; otherwise, it will clear the tag of the result.
  • CFromPtr is like CSetAddr with the exceptions that
    • CFromPtr cd, cs, rs with rs holding 0 is always NULL, while CSetAddr cd, cs, rs may generate a tagged result if 0 is in the representable region of cs.
    • CFromPtr cd, cnull, rs generates a DDC-derived capability with offset rs, while CSetAddr cd, cnull, rs generates a NULL-derived capability with offset rs.

Right now, CHERI-RISC-V...

  • is a merged register file
  • presumes a 64-bit AS and has no TBI analogue, and so the entire bottom word of a capability is the 64-bit address field. As such, the existing CGetAddr cd, cs is equivalent to CSetAddr cd, cnull, rs, where rs is the integer-subregister alias of cs.

Also right now, C casts between pointers and integers are lowered by...

  • in hybrid, using CToPtr and CFromPtr
  • in purecap, using CGetAddr and CIncOffset with cnull source. In both cases, CGetAddr and CSetAddr are used when performing arithmetic on a capability address yielding a capability result. This lowering is compatible with split register files, as it does not exploit the as-integer aliases. (Playing in compiler-explorer: https://cheri-compiler-explorer.cl.cam.ac.uk/z/WsoY17 )

And now let's pry things apart:

  1. If we split the register file, then as-integer operations are no longer possible, so an explicit opcode is needed, but at the moment CGetAddr can serve that role fine.

  2. If we have a TBI-like scheme that is common between integer pointers and capability pointers, such as MTE/SSM/ADI flavored techniques, then CToPtr and CGetAddr should pass these bits through for use by the integer-authority memory instructions. CGetAddr is then a bit of a misnomer, but so it goes. In particular, conventional architectures expose pointers as two-to-some-power-of-two-sized integers because that's the only thing that you can store with natural alignment. In a CHERI (and TIB) system, the address is no longer the whole pointer and so we have no requirement that the address is even an integer multiple of bytes. Most systems with virtual memory have less than a 64-bit address space, with 48 and 59 bits being common sizes. We could have more efficient capability encodings if we were to restrict the size of the address field to the virtual address-space size, which would also make non-canonical addresses impossible to create by construction.

  3. If we instead explicitly use a smaller address space and reclaim bits in the capability encoding for things beyond the ken of integer pointers (say, additional permission bits or an architecture without non-CHERI MTE), then, I think, CToPtr and CGetAddr should ignore these bits and sign-extend the narrower address space projection to a full machine word.

    In principle, in this case, as-integer aliasing becomes the only way to see these additional bits in registers and there is no setter dual to manipulate them as data within the larger capability structure in a register without imposing the requirement that it be NULL-derived. That is, updates would have to go via memory. In a split register file, both accesses and updates would have to be via memory (reminiscent of capability upper words before C{Get,Set}High).

It is for this kind of an eventuality that I would like to have C{Get,Set}Low reserved. On merged register machines, CGetLow rd, cs is the same as mov rd, rs (with rs aliasing cs; CSetLow cd, cnull, rs produces the same result as well). On split register machines, CGetLow has independent meaning.

There is a challenge around C lowering in purecap mode:

  • the current lowering, of CGetAddr rd, cs and CIncOffset cd, cnull, rs could no longer hold full-width constants (that are not sign extensions of smaller constants).
  • even moving to a CGetLow/CSetLow lowering is likely to be confusing if the address field is not contiguous and inclusive of the least significant bit of the lower word, as then (int)((intcap_t)p & 0xFF) may not be in [0, 0xFF]: the & uses CGetAddr/CSetAddr while the cast would use CGetLow. Perhaps at this point we throw in the towel on the full opacity of the capability encoding and require that CGetAddr and CGetLow can differ only by the former sign-extending some truncation of the latter.
Clone this wiki locally