Skip to content

Armv8.2 SM3和SM4

Sun Yimin edited this page Feb 18, 2022 · 31 revisions

SM3 arm64 plain asm on arm64-graviton2

go test -v -short -bench . -run=^$ ./...
goos: linux
goarch: arm64
pkg: github.com/emmansun/gmsm/sm3
BenchmarkHash8Bytes
BenchmarkHash8Bytes-2     	 2738724	       438.4 ns/op	  18.25 MB/s
BenchmarkHash1K
BenchmarkHash1K-2         	  192519	      6232 ns/op	 164.32 MB/s
BenchmarkHash8K
BenchmarkHash8K-2         	   24950	     48112 ns/op	 170.27 MB/s
BenchmarkHash8K_SH256
BenchmarkHash8K_SH256-2   	  223354	      5369 ns/op	1525.81 MB/s
PASS
ok  	github.com/emmansun/gmsm/sm3	5.857s

和CPU指令级别的差距基本上是10倍!

SM4 with AES

AESE指令相当于:

  1. AddRoundKey(state, RoudKey)
  2. ShiftRows(State)
  3. SubBytes(State)

所以,如果RoundKey = 0, 那么AESE相当于执行了

  1. ShiftRows(State)
  2. SubBytes(State)
    go test -v -short -bench . -run=^$ ./...
    goos: linux
    goarch: arm64
    pkg: github.com/emmansun/gmsm/sm4
    BenchmarkEncrypt
    BenchmarkEncrypt-2   	 2145859	       559.1 ns/op	  28.62 MB/s
    BenchmarkDecrypt
    BenchmarkDecrypt-2   	 2145296	       559.4 ns/op	  28.60 MB/s
    BenchmarkExpand
    BenchmarkExpand-2    	 2064466	       581.2 ns/op
    PASS
    ok  	github.com/emmansun/gmsm/sm4	5.334s

SM4 with SM4E & SM4EKEY

SM4EKEY SM4E 目前golang还没有支持SM4E/SM4EKEY指令,不过我们可以根据不支持的操作码来处理:

  1. Clone codes from https://github.com/golang/arch
  2. 修改arm64asm/tables.go: 增加SM4E/SM4EKEY常量;同时加入opstr;加入指令到instFormats。 image
	// SM4E <Vd>.4S, <Vn>.4S
	{0xfffffc00, 0xcec08400, SM4E, instArgs{arg_Vd_arrangement_4S, arg_Vn_arrangement_4S}, nil},
	// SM4EKEY <Vd>.4S, <Vn>.4S, <Vm>.4S
	{0xffe0fc00, 0xce60c800, SM4EKEY, instArgs{arg_Vd_arrangement_4S, arg_Vn_arrangement_4S, arg_Vm_arrangement_4S}, nil},	
  1. 修改arm64asm/plan9x.go,noSuffixOpSet里加上SM4E和SM4EKEY,这个是可选的,加了的话,plan9x的指令就不会出现V前缀。
  2. 写测试,testDecodeLine()方法是从decode_test.go的testDecode()方法中抽出来的。看了那个Decode()方法就能编码出那些32位的code了。

func TestDecodeSM4Codes(t *testing.T) {
	//gnu syntax, load 16 bytes plaintext to v8 (need to reverse byte order first), 32 round keys to v0-v7, the final result should be reverse byte order again
	testDecodeLine(t, "gnu", "0884c0ce|	sm4e v8.4s, v0.4s")
	testDecodeLine(t, "gnu", "2884c0ce|	sm4e v8.4s, v1.4s")
	testDecodeLine(t, "gnu", "4884c0ce|	sm4e v8.4s, v2.4s")
	testDecodeLine(t, "gnu", "6884c0ce|	sm4e v8.4s, v3.4s")
	testDecodeLine(t, "gnu", "8884c0ce|	sm4e v8.4s, v4.4s")
	testDecodeLine(t, "gnu", "a884c0ce|	sm4e v8.4s, v5.4s")
	testDecodeLine(t, "gnu", "c884c0ce|	sm4e v8.4s, v6.4s")
	testDecodeLine(t, "gnu", "e884c0ce|	sm4e v8.4s, v7.4s")
	//plan9 syntax, load 16 bytes plaintext to v8 (need to reverse byte order first), 32 round keys to v0-v7, the final result should be reverse byte order again
	testDecodeLine(t, "plan9", "0884c0ce|	SM4E V0.S4, V8.S4")
	testDecodeLine(t, "plan9", "2884c0ce|	SM4E V1.S4, V8.S4")
	testDecodeLine(t, "plan9", "4884c0ce|	SM4E V2.S4, V8.S4")
	testDecodeLine(t, "plan9", "6884c0ce|	SM4E V3.S4, V8.S4")
	testDecodeLine(t, "plan9", "8884c0ce|	SM4E V4.S4, V8.S4")
	testDecodeLine(t, "plan9", "a884c0ce|	SM4E V5.S4, V8.S4")
	testDecodeLine(t, "plan9", "c884c0ce|	SM4E V6.S4, V8.S4")
	testDecodeLine(t, "plan9", "e884c0ce|	SM4E V7.S4, V8.S4")
	//gnu syntax, load 32 ck to v0-v7, root key (reverse byte order first) xor fk to v8, the result round keys will be in v9, need to move v9 to v8 from second invocation of sm4ekey
	testDecodeLine(t, "gnu", "09c960ce|	sm4ekey v9.4s, v8.4s, v0.4s")
	testDecodeLine(t, "gnu", "09c961ce|	sm4ekey v9.4s, v8.4s, v1.4s")
	testDecodeLine(t, "gnu", "09c962ce|	sm4ekey v9.4s, v8.4s, v2.4s")
	testDecodeLine(t, "gnu", "09c963ce|	sm4ekey v9.4s, v8.4s, v3.4s")
	testDecodeLine(t, "gnu", "09c964ce|	sm4ekey v9.4s, v8.4s, v4.4s")
	testDecodeLine(t, "gnu", "09c965ce|	sm4ekey v9.4s, v8.4s, v5.4s")
	testDecodeLine(t, "gnu", "09c966ce|	sm4ekey v9.4s, v8.4s, v6.4s")
	testDecodeLine(t, "gnu", "09c967ce|	sm4ekey v9.4s, v8.4s, v7.4s")
	//gnu syntax, load 32 ck to v0-v7, root key (reverse byte order first) xor fk to v8, the result round keys will be in v9 (1,3,5,7) and v8 (2,4,6,8),避免寄存器copy。
	testDecodeLine(t, "gnu", "09c960ce|	sm4ekey v9.4s, v8.4s, v0.4s")
	testDecodeLine(t, "gnu", "28c961ce|	sm4ekey v8.4s, v9.4s, v1.4s")
	testDecodeLine(t, "gnu", "09c962ce|	sm4ekey v9.4s, v8.4s, v2.4s")
	testDecodeLine(t, "gnu", "28c963ce|	sm4ekey v8.4s, v9.4s, v3.4s")
	testDecodeLine(t, "gnu", "09c964ce|	sm4ekey v9.4s, v8.4s, v4.4s")
	testDecodeLine(t, "gnu", "28c965ce|	sm4ekey v8.4s, v9.4s, v5.4s")
	testDecodeLine(t, "gnu", "09c966ce|	sm4ekey v9.4s, v8.4s, v6.4s")
	testDecodeLine(t, "gnu", "28c967ce|	sm4ekey v8.4s, v9.4s, v7.4s")
}

每次sm4e/sm4ekey只能执行4轮,所以需要调用8次。

4.然后,你就可以在golang的arm64的汇编中使用那些32位的codes了。

WORD	$0x0884c0ce       // SM4E V0.S4, V8.S4

可惜没有环境!!!

SM3 with SM3PARTW1 / SM3PARTW2 / SM3SS1 / SM3TT1A / SM3TT2A / SM3TT2A / SM3TT2B

模拟代码

Reference

SM3和SM4 CPU指令实现,找不到相关CPU环境,mark先。

  1. Summary of A64 cryptographic instructions
  2. Arm A64 Instruction Set Architecture
  3. linux arm64 crypto / (https://github.com/torvalds/linux/tree/master/arch/arm64/crypto)
  4. A Quick Guide to Go's Assembler
  5. Golang arm instructions mapping
  6. A C/C++ header file that converts Intel SSE intrinsics to Arm/Aarch64 NEON intrinsics.
  7. asm2go