AArch64 Options (Using the GNU Compiler Collection (GCC))

3.20.1 AArch64 Options

These options are defined for AArch64 implementations:

-mabi=name

Generate code for the specified data model. Permissible values are ilp32 for SysV-like data model where int, long int and pointers are 32 bits, and lp64 for SysV-like data model where int is 32 bits, but long int and pointers are 64 bits.

The default depends on the specific target configuration. Note that the LP64 and ILP32 ABIs are not link-compatible; you must compile your entire program with the same ABI, and link with a compatible set of libraries.

The ilp32 model is deprecated.

-mbig-endian

Generate big-endian code. This is the default when GCC is configured for an aarch64_be-*-* target.

-mgeneral-regs-only

Generate code which uses only the general-purpose registers. This will prevent the compiler from using floating-point and Advanced SIMD registers but will not impose any restrictions on the assembler.

-mlittle-endian

Generate little-endian code. This is the default when GCC is configured for an aarch64-*-* but not an aarch64_be-*-* target.

-mcmodel=tiny

Generate code for the tiny code model. The program and its statically defined symbols must be within 1MB of each other. Programs can be statically or dynamically linked.

-mcmodel=small

Generate code for the small code model. The program and its statically defined symbols must be within 4GB of each other. Programs can be statically or dynamically linked. This is the default code model.

-mcmodel=large

Generate code for the large code model. This makes no assumptions about addresses and sizes of sections. Programs can be statically linked only. The -mcmodel=large option is incompatible with -mabi=ilp32, -fpic and -fPIC.

-mtp=name

Specify the system register to use as a thread pointer. The valid values are tpidr_el0, tpidrro_el0, tpidr_el1, tpidr_el2, tpidr_el3. For backwards compatibility the aliases el0, el1, el2, el3 are also accepted. The default setting is tpidr_el0. It is recommended to compile all code intended to interoperate with the same value of this option to avoid accessing a different thread pointer from the wrong exception level.

-mstrict-align

-mno-strict-align

Avoid or allow generating memory accesses that may not be aligned on a natural object boundary as described in the architecture specification.

-momit-leaf-frame-pointer

-mno-omit-leaf-frame-pointer

Omit or keep the frame pointer in leaf functions. The former behavior is the default.

-mstack-protector-guard=guard

-mstack-protector-guard-reg=reg

-mstack-protector-guard-offset=offset

Generate stack protection code using canary at guard. Supported locations are global for a global canary or sysreg for a canary in an appropriate system register.

With the latter choice the options -mstack-protector-guard-reg=reg and -mstack-protector-guard-offset=offset furthermore specify which system register to use as base register for reading the canary, and from what offset from that base register. There is no default register or offset as this is entirely for use within the Linux kernel.

-mtls-dialect=desc

Use TLS descriptors as the thread-local storage mechanism for dynamic accesses of TLS variables. This is the default.

-mtls-dialect=traditional

Use traditional TLS as the thread-local storage mechanism for dynamic accesses of TLS variables.

-mtls-size=size

Specify bit size of immediate TLS offsets. Valid values are 12, 24, 32, 48. This option requires binutils 2.26 or newer.

-mfix-cortex-a53-835769

-mno-fix-cortex-a53-835769

Enable or disable the workaround for the ARM Cortex-A53 erratum number 835769. This involves inserting a NOP instruction between memory instructions and 64-bit integer multiply-accumulate instructions. This flag will be ignored if an architecture or cpu is specified on the command line which does not need the workaround.

-mfix-cortex-a53-843419

-mno-fix-cortex-a53-843419

Enable or disable the workaround for the ARM Cortex-A53 erratum number 843419. This erratum workaround is made at link time and this will only pass the corresponding flag to the linker. This flag will be ignored if an architecture or cpu is specified on the command line which does not need the workaround.

-mlow-precision-recip-sqrt

-mno-low-precision-recip-sqrt

Enable or disable the reciprocal square root approximation. This option only has an effect if -ffast-math or -funsafe-math-optimizations is used as well. Enabling this reduces precision of reciprocal square root results to about 16 bits for single precision and to 32 bits for double precision.

-mlow-precision-sqrt

-mno-low-precision-sqrt

Enable or disable the square root approximation. This option only has an effect if -ffast-math or -funsafe-math-optimizations is used as well. Enabling this reduces precision of square root results to about 16 bits for single precision and to 32 bits for double precision. If enabled, it implies -mlow-precision-recip-sqrt.

-mlow-precision-div

-mno-low-precision-div

Enable or disable the division approximation. This option only has an effect if -ffast-math or -funsafe-math-optimizations is used as well. Enabling this reduces precision of division results to about 16 bits for single precision and to 32 bits for double precision.

-mtrack-speculation

-mno-track-speculation

Enable or disable generation of additional code to track speculative execution through conditional branches. The tracking state can then be used by the compiler when expanding calls to __builtin_speculation_safe_copy to permit a more efficient code sequence to be generated.

-moutline-atomics

-mno-outline-atomics

Enable or disable calls to out-of-line helpers to implement atomic operations. These helpers will, at runtime, determine if the LSE instructions from ARMv8.1-A can be used; if not, they will use the load/store-exclusive instructions that are present in the base ARMv8.0 ISA.

This option is only applicable when compiling for the base ARMv8.0 instruction set. If using a later revision, e.g. -march=armv8.1-a or -march=armv8-a+lse, the ARMv8.1-Atomics instructions will be used directly. The same applies when using -mcpu= when the selected cpu supports the lse feature. This option is on by default.

-march=name

Specify the name of the target architecture and, optionally, one or more feature modifiers. This option has the form -march=arch{+[no]feature}*.

The table below summarizes the permissible values for arch and the features that they enable by default:

`arch` value	Architecture	Includes by default
`armv8-a`	Armv8-A	`+fp`, `+simd`
`armv8.1-a`	Armv8.1-A	`armv8-a`, `+crc`, `+lse`, `+rdma`
`armv8.2-a`	Armv8.2-A	`armv8.1-a`
`armv8.3-a`	Armv8.3-A	`armv8.2-a`, `+pauth`, `+fcma`, `+jscvt`
`armv8.4-a`	Armv8.4-A	`armv8.3-a`, `+flagm`, `+fp16fml`, `+dotprod`, `+rcpc2`
`armv8.5-a`	Armv8.5-A	`armv8.4-a`, `+sb`, `+ssbs`, `+predres`, `+frintts`, `+flagm2`
`armv8.6-a`	Armv8.6-A	`armv8.5-a`, `+bf16`, `+i8mm`
`armv8.7-a`	Armv8.7-A	`armv8.6-a`, `+wfxt`, `+xs`
`armv8.8-a`	Armv8.8-a	`armv8.7-a`, `+mops`
`armv8.9-a`	Armv8.9-a	`armv8.8-a`
`armv9-a`	Armv9-A	`armv8.5-a`, `+sve`, `+sve2`
`armv9.1-a`	Armv9.1-A	`armv9-a`, `+bf16`, `+i8mm`
`armv9.2-a`	Armv9.2-A	`armv9.1-a`, `+wfxt`, `+xs`
`armv9.3-a`	Armv9.3-A	`armv9.2-a`, `+mops`
`armv9.4-a`	Armv9.4-A	`armv9.3-a`, `+sve2p1`
`armv9.5-a`	Armv9.5-A	`armv9.4-a`, `cpa`, `+faminmax`, `+lut`
`armv8-r`	Armv8-R	`armv8-r`

The value native is available on native AArch64 GNU/Linux and causes the compiler to pick the architecture of the host system. This option has no effect if the compiler is unable to recognize the architecture of the host system. When -march=native is given and no other -mcpu or -mtune is given then GCC will pick the host CPU as the CPU to tune for as well as select the architecture features from. That is, -march=native is treated as -mcpu=native.

The permissible values for feature are listed in the sub-section on -march and -mcpu Feature Modifiers. Where conflicting feature modifiers are specified, the right-most feature is used.

GCC uses name to determine what kind of instructions it can emit when generating assembly code. If -march is specified without either of -mtune or -mcpu also being specified, the code is tuned to perform well across a range of target processors implementing the target architecture.

-mtune=name

Specify the name of the target processor for which GCC should tune the performance of the code. Permissible values for this option are: generic, cortex-a35, cortex-a53, cortex-a55, cortex-a57, cortex-a72, cortex-a73, cortex-a75, cortex-a76, cortex-a76ae, cortex-a77, cortex-a65, cortex-a65ae, cortex-a34, cortex-a78, cortex-a78ae, cortex-a78c, ares, exynos-m1, emag, falkor, oryon-1, neoverse-512tvb, neoverse-e1, neoverse-n1, neoverse-n2, neoverse-v1, neoverse-v2, grace, neoverse-v3, neoverse-v3ae, neoverse-n3, olympus, cortex-a725, cortex-x925, qdf24xx, saphira, phecda, xgene1, vulcan, octeontx, octeontx81, octeontx83, octeontx2, octeontx2t98, octeontx2t96 octeontx2t93, octeontx2f95, octeontx2f95n, octeontx2f95mm, a64fx, fujitsu-monaka, thunderx, thunderxt88, thunderxt88p1, thunderxt81, tsv110, thunderxt83, thunderx2t99, thunderx3t110, zeus, cortex-a57.cortex-a53, cortex-a72.cortex-a53, cortex-a73.cortex-a35, cortex-a73.cortex-a53, cortex-a75.cortex-a55, cortex-a76.cortex-a55, cortex-r82, cortex-r82ae, cortex-x1, cortex-x1c, cortex-x2, cortex-x3, cortex-x4, cortex-a510, cortex-a520, cortex-a520ae, cortex-a710, cortex-a715, cortex-a720, cortex-a720ae, ampere1, ampere1a, ampere1b, cobalt-100, apple-m1, apple-m2, apple-m3 and native.

The values cortex-a57.cortex-a53, cortex-a72.cortex-a53, cortex-a73.cortex-a35, cortex-a73.cortex-a53, cortex-a75.cortex-a55, cortex-a76.cortex-a55, apple-m1, apple-m2, apple-m3, gb10 specify that GCC should tune for a big.LITTLE system.

The value neoverse-512tvb specifies that GCC should tune for Neoverse cores that (a) implement SVE and (b) have a total vector bandwidth of 512 bits per cycle. In other words, the option tells GCC to tune for Neoverse cores that can execute 4 128-bit Advanced SIMD arithmetic instructions a cycle and that can execute an equivalent number of SVE arithmetic instructions per cycle (2 for 256-bit SVE, 4 for 128-bit SVE). This is more general than tuning for a specific core like Neoverse V1 but is more specific than the default tuning described below.

Additionally on native AArch64 GNU/Linux systems the value native tunes performance to the host system. This option has no effect if the compiler is unable to recognize the processor of the host system.

Where none of -mtune=, -mcpu= or -march= are specified, the code is tuned to perform well across a range of target processors.

This option cannot be suffixed by feature modifiers.

-mcpu=name

Specify the name of the target processor, optionally suffixed by one or more feature modifiers. This option has the form -mcpu=cpu{+[no]feature}*, where the permissible values for cpu are the same as those available for -mtune. The permissible values for feature are documented in the sub-section on -march and -mcpu Feature Modifiers. Where conflicting feature modifiers are specified, the right-most feature is used.

GCC uses name to determine what kind of instructions it can emit when generating assembly code (as if by -march) and to determine the target processor for which to tune for performance (as if by -mtune). Where this option is used in conjunction with -march or -mtune, those options take precedence over the appropriate part of this option.

-mcpu=neoverse-512tvb is special in that it does not refer to a specific core, but instead refers to all Neoverse cores that (a) implement SVE and (b) have a total vector bandwidth of 512 bits a cycle. Unless overridden by -march, -mcpu=neoverse-512tvb generates code that can run on a Neoverse V1 core, since Neoverse V1 is the first Neoverse core with these properties. Unless overridden by -mtune, -mcpu=neoverse-512tvb tunes code in the same way as for -mtune=neoverse-512tvb.

-moverride=string

Override tuning decisions made by the back-end in response to a -mtune= switch. The syntax, semantics, and accepted values for string in this option are not guaranteed to be consistent across releases.

This option is only intended to be useful when developing GCC.

-mverbose-cost-dump

Enable verbose cost model dumping in the debug dump files. This option is provided for use in debugging the compiler.

-mpc-relative-literal-loads

-mno-pc-relative-literal-loads

Enable or disable PC-relative literal loads. With this option literal pools are accessed using a single instruction and emitted after each function. This limits the maximum size of functions to 1MB. This is enabled by default for -mcmodel=tiny.

-msign-return-address=scope

Select the function scope on which return address signing will be applied. Permissible values are none, which disables return address signing, non-leaf, which enables pointer signing for functions which are not leaf functions, and all, which enables pointer signing for all functions. The default value is none. This option has been deprecated by -mbranch-protection.

-mbranch-protection=none|standard|pac-ret[+leaf+b-key]|bti|gcs

Select the branch protection features to use. none is the default and turns off all types of branch protection. standard turns on all types of branch protection features. If a feature has additional tuning options, then standard sets it to its standard level. pac-ret[+leaf] turns on return address signing to its standard level: signing functions that save the return address to memory (non-leaf functions will practically always do this) using the a-key. The optional argument leaf can be used to extend the signing to include leaf functions. The optional argument b-key can be used to sign the functions with the B-key instead of the A-key. bti turns on branch target identification mechanism. gcs turns on guarded control stack compatible code generation.

-mharden-sls=opts

Enable compiler hardening against straight line speculation (SLS). opts is a comma-separated list of the following options:

retbr
blr

In addition, -mharden-sls=all enables all SLS hardening while -mharden-sls=none disables all SLS hardening.

-mearly-ra=scope

Determine when to enable an early register allocation pass. This pass runs before instruction scheduling and tries to find a spill-free allocation of floating-point and vector code. It also tries to make use of strided multi-register instructions, such as SME2s strided LD1 and ST1.

The possible values of scope are: all, which runs the pass on all functions; strided, which runs the pass on functions that have access to strided multi-register instructions; and none, which disables the pass.

-mearly-ra=all is the default for -O2 and above, and for -Os. -mearly-ra=none is the default otherwise.

-mearly-ldp-fusion

Enable the copy of the AArch64 load/store pair fusion pass that runs before register allocation. Enabled by default at -O and above.

-mlate-ldp-fusion

Enable the copy of the AArch64 load/store pair fusion pass that runs after register allocation. Enabled by default at -O and above.

-msve-vector-bits=bits

Specify the number of bits in an SVE vector register. This option only has an effect when SVE is enabled.

GCC supports two forms of SVE code generation: vector-length agnostic output that works with any size of vector register and vector-length specific output that allows GCC to make assumptions about the vector length when it is useful for optimization reasons. The possible values of bits are: scalable, 128, 256, 512, 1024 and 2048. Specifying scalable selects vector-length agnostic output. At present -msve-vector-bits=128 also generates vector-length agnostic output for big-endian targets. All other values generate vector-length specific code. The behavior of these values may change in future releases and no value except scalable should be relied on for producing code that is portable across different hardware SVE vector lengths.

The default is -msve-vector-bits=scalable, which produces vector-length agnostic code.

-Wexperimental-fmv-target

Warn about use of experimental Function Multi Versioning. The Arm C Language Extension specification for Function Multi Versioning is beta and subject to change. Any usage of FMV is caveated that future behavior change and incompatibility is likely.

3.20.1.1 `-march` and `-mcpu` Feature Modifiers

Feature modifiers used with -march and -mcpu can be any of the following and their inverses nofeature:

crc: Enable CRC extension. This is on by default for -march=armv8.1-a.
crypto: Enable Crypto extension. This also enables Advanced SIMD and floating-point instructions.
fp: Enable floating-point instructions. This is on by default for all possible values for options -march and -mcpu.
simd: Enable Advanced SIMD instructions. This also enables floating-point instructions. This is on by default for all possible values for options -march and -mcpu.
sve: Enable Scalable Vector Extension instructions. This also enables Advanced SIMD and floating-point instructions.
lse: Enable Large System Extension instructions. This is on by default for -march=armv8.1-a.
rdma: Enable Round Double Multiply Accumulate instructions. This is on by default for -march=armv8.1-a.
fp16: Enable FP16 extension. This also enables floating-point instructions.
fp16fml: Enable FP16 fmla extension. This also enables FP16 extensions and floating-point instructions. This option is enabled by default for -march=armv8.4-a. Use of this option with architectures prior to Armv8.2-A is not supported.
rcpc: Enable the RCpc extension. This enables the use of the LDAPR instructions for load-acquire atomic semantics, and passes it on to the assembler, enabling inline asm statements to use instructions from the RCpc extension.
dotprod: Enable the Dot Product extension. This also enables Advanced SIMD instructions.
aes: Enable the Armv8-a aes and pmull crypto extension. This also enables Advanced SIMD instructions.
sha2: Enable the Armv8-a sha2 crypto extension. This also enables Advanced SIMD instructions.
sha3: Enable the sha512 and sha3 crypto extension. This also enables Advanced SIMD instructions. Use of this option with architectures prior to Armv8.2-A is not supported.
sm4: Enable the sm3 and sm4 crypto extension. This also enables Advanced SIMD instructions. Use of this option with architectures prior to Armv8.2-A is not supported.
profile: Enable the Statistical Profiling extension. This option is only to enable the extension at the assembler level and does not affect code generation.
rng: Enable the Armv8.5-a Random Number instructions. This option is only to enable the extension at the assembler level and does not affect code generation.
memtag: Enable the Armv8.5-a Memory Tagging Extensions. Use of this option with architectures prior to Armv8.5-A is not supported.
sb: Enable the Armv8-a Speculation Barrier instruction. This option is only to enable the extension at the assembler level and does not affect code generation. This option is enabled by default for -march=armv8.5-a.
ssbs: Enable the Armv8-a Speculative Store Bypass Safe instruction. This option is only to enable the extension at the assembler level and does not affect code generation. This option is enabled by default for -march=armv8.5-a.
predres: Enable the Armv8-a Execution and Data Prediction Restriction instructions. This option is only to enable the extension at the assembler level and does not affect code generation. This option is enabled by default for -march=armv8.5-a.
sve2: Enable the Armv8-a Scalable Vector Extension 2. This also enables SVE instructions.
sve2-bitperm: Enable SVE2 bitperm instructions. This also enables SVE2 instructions.
sve2-sm4: Enable SVE2 sm4 instructions. This also enables SVE2 instructions.
sve2-aes: Enable SVE2 aes instructions. This also enables SVE2 instructions.
sve2-sha3: Enable SVE2 sha3 instructions. This also enables SVE2 instructions.
sve2p1: Enable SVE2.1 instructions. This also enables SVE2 instructions.
tme: Enable the Transactional Memory Extension.
i8mm: Enable 8-bit Integer Matrix Multiply instructions. This also enables Advanced SIMD and floating-point instructions. This option is enabled by default for -march=armv8.6-a. Use of this option with architectures prior to Armv8.2-A is not supported.
f32mm: Enable 32-bit Floating point Matrix Multiply instructions. This also enables SVE instructions. Use of this option with architectures prior to Armv8.2-A is not supported.
f64mm: Enable 64-bit Floating point Matrix Multiply instructions. This also enables SVE instructions. Use of this option with architectures prior to Armv8.2-A is not supported.
bf16: Enable brain half-precision floating-point instructions. This also enables Advanced SIMD and floating-point instructions. This option is enabled by default for -march=armv8.6-a. Use of this option with architectures prior to Armv8.2-A is not supported.
ls64: Enable the 64-byte atomic load and store instructions for accelerators.
mops: Enable the instructions to accelerate memory operations like memcpy, memmove, memset. This option is enabled by default for -march=armv8.8-a
flagm: Enable the Flag Manipulation instructions Extension.
flagm2: Enable the FlagM2 flag conversion instructions.
pauth: Enable the Pointer Authentication Extension.
cssc: Enable the Common Short Sequence Compression instructions.
sme: Enable the Scalable Matrix Extension. This is only supported when SVE2 is also enabled.
sme-i16i64: Enable the FEAT_SME_I16I64 extension to SME. This also enables SME instructions.
sme-f64f64: Enable the FEAT_SME_F64F64 extension to SME. This also enables SME instructions.
sme2: Enable the Scalable Matrix Extension 2. This also enables SME instructions.
sme-b16b16: Enable the FEAT_SME_B16B16 extension to SME. This also enables SME2 and SVE_B16B16 instructions.
sme-f16f16: Enable the FEAT_SME_F16F16 extension to SME. This also enables SME2 instructions.
sme2p1: Enable the Scalable Matrix Extension version 2.1. This also enables SME2 instructions.
fcma: Enable the complex number SIMD extensions.
jscvt: Enable the fjcvtzs JavaScript conversion instruction.
frintts: Enable floating-point round to integral value instructions.
wfxt: Enable wfet and wfit instructions.
xs: Enable the XS memory attribute extension.
lse128: Enable the LSE128 128-bit atomic instructions extension. This also enables LSE instructions.
d128: Enable support for 128-bit system register read/write instructions. This also enables the LSE128 extension.
gcs: Enable support for Armv9.4-a Guarded Control Stack extension.
the: Enable support for Armv8.9-a/9.4-a translation hardening extension.
rcpc2: Enable the RCpc2 extension.
rcpc3: Enable the RCpc3 (Release Consistency) extension.
fp8: Enable the fp8 (8-bit floating point) extension.
fp8fma: Enable the fp8 (8-bit floating point) multiply accumulate extension.
ssve-fp8fma: Enable the fp8 (8-bit floating point) multiply accumulate extension in streaming mode.
fp8dot4: Enable the fp8 (8-bit floating point) to single-precision 4-way dot product extension.
ssve-fp8dot4: Enable the fp8 (8-bit floating point) to single-precision 4-way dot product extension in streaming mode.
fp8dot2: Enable the fp8 (8-bit floating point) to half-precision 2-way dot product extension.
ssve-fp8dot2: Enable the fp8 (8-bit floating point) to half-precision 2-way dot product extension in streaming mode.
faminmax: Enable the Floating Point Absolute Maximum/Minimum extension.
lut: Enable the Lookup Table extension.
cpa: Enable the Checked Pointer Arithmetic instructions.
sve-b16b16: Enable the SVE non-widening brain floating-point (bf16) extension. This only has an effect when sve2 or sme2 are also enabled.

Feature crypto implies aes, sha2, and simd, which implies fp. Conversely, nofp implies nosimd, which implies nocrypto, noaes and nosha2.

3.20.1 AArch64 Options

3.20.1.1 -march and -mcpu Feature Modifiers

3.20.1.1 `-march` and `-mcpu` Feature Modifiers