x86 Function Attributes (Using the GNU Compiler Collection (GCC))

6.33.33 x86 Function Attributes

These function attributes are supported by the x86 back end:

cdecl

On the x86-32 targets, the cdecl attribute causes the compiler to assume that the calling function pops off the stack space used to pass arguments. This is useful to override the effects of the -mrtd switch.

fastcall

On x86-32 targets, the fastcall attribute causes the compiler to pass the first argument (if of integral type) in the register ECX and the second argument (if of integral type) in the register EDX. Subsequent and other typed arguments are passed on the stack. The called function pops the arguments off the stack. If the number of arguments is variable all arguments are pushed on the stack.

thiscall

On x86-32 targets, the thiscall attribute causes the compiler to pass the first argument (if of integral type) in the register ECX. Subsequent and other typed arguments are passed on the stack. The called function pops the arguments off the stack. If the number of arguments is variable all arguments are pushed on the stack. The thiscall attribute is intended for C++ non-static member functions. As a GCC extension, this calling convention can be used for C functions and for static member methods.

ms_abi

sysv_abi

On 32-bit and 64-bit x86 targets, you can use an ABI attribute to indicate which calling convention should be used for a function. The ms_abi attribute tells the compiler to use the Microsoft ABI, while the sysv_abi attribute tells the compiler to use the System V ELF ABI, which is used on GNU/Linux and other systems. The default is to use the Microsoft ABI when targeting Windows. On all other systems, the default is the System V ELF ABI.

Note, the ms_abi attribute for Microsoft Windows 64-bit targets currently requires the -maccumulate-outgoing-args option.

callee_pop_aggregate_return (number)

On x86-32 targets, you can use this attribute to control how aggregates are returned in memory. If the caller is responsible for popping the hidden pointer together with the rest of the arguments, specify number equal to zero. If callee is responsible for popping the hidden pointer, specify number equal to one.

The default x86-32 ABI assumes that the callee pops the stack for hidden pointer. However, on x86-32 Microsoft Windows targets, the compiler assumes that the caller pops the stack for hidden pointer.

ms_hook_prologue

On 32-bit and 64-bit x86 targets, you can use this function attribute to make GCC generate the hot-patching function prologue used in Win32 API functions in Microsoft Windows XP Service Pack 2 and newer.

naked

This attribute allows the compiler to construct the requisite function declaration, while allowing the body of the function to be assembly code. The specified function will not have prologue/epilogue sequences generated by the compiler. Only basic asm statements can safely be included in naked functions (see Basic Asm). While using extended asm or a mixture of basic asm and C code may appear to work, they cannot be depended upon to work reliably and are not supported.

regparm (number)

On x86-32 targets, the regparm attribute causes the compiler to pass arguments number one to number if they are of integral type in registers EAX, EDX, and ECX instead of on the stack. Functions that take a variable number of arguments continue to be passed all of their arguments on the stack.

Beware that on some ELF systems this attribute is unsuitable for global functions in shared libraries with lazy binding (which is the default). Lazy binding sends the first call via resolving code in the loader, which might assume EAX, EDX and ECX can be clobbered, as per the standard calling conventions. Solaris 8 is affected by this. Systems with the GNU C Library version 2.1 or higher and FreeBSD are believed to be safe since the loaders there save EAX, EDX and ECX. (Lazy binding can be disabled with the linker or the loader if desired, to avoid the problem.)

sseregparm

On x86-32 targets with SSE support, the sseregparm attribute causes the compiler to pass up to 3 floating-point arguments in SSE registers instead of on the stack. Functions that take a variable number of arguments continue to pass all of their floating-point arguments on the stack.

force_align_arg_pointer

On x86 targets, the force_align_arg_pointer attribute may be applied to individual function definitions, generating an alternate prologue and epilogue that realigns the run-time stack if necessary. This supports mixing legacy codes that run with a 4-byte aligned stack with modern codes that keep a 16-byte stack for SSE compatibility.

stdcall

On x86-32 targets, the stdcall attribute causes the compiler to assume that the called function pops off the stack space used to pass arguments, unless it takes a variable number of arguments.

no_callee_saved_registers

Use this attribute to indicate that the specified function has no callee-saved registers. That is, all registers can be used as scratch registers. For example, this attribute can be used for a function called from the interrupt handler assembly stub which will preserve all registers and return from interrupt.

no_caller_saved_registers

Use this attribute to indicate that the specified function has no caller-saved registers. That is, all registers are callee-saved. For example, this attribute can be used for a function called from an interrupt handler. The compiler generates proper function entry and exit sequences to save and restore any modified registers, except for the EFLAGS register. Since GCC doesnt preserve SSE, MMX nor x87 states, the GCC option -mgeneral-regs-only should be used to compile functions with no_caller_saved_registers attribute.

interrupt

Use this attribute to indicate that the specified function is an interrupt handler or an exception handler (depending on parameters passed to the function, explained further). The compiler generates function entry and exit sequences suitable for use in an interrupt handler when this attribute is present. The IRET instruction, instead of the RET instruction, is used to return from interrupt handlers. All registers, except for the EFLAGS register which is restored by the IRET instruction, are preserved by the compiler. Since GCC doesnt preserve SSE, MMX nor x87 states, the GCC option -mgeneral-regs-only should be used to compile interrupt and exception handlers.

Any interruptible-without-stack-switch code must be compiled with -mno-red-zone since interrupt handlers can and will, because of the hardware design, touch the red zone.

An interrupt handler must be declared with a mandatory pointer argument:

struct interrupt_frame;

__attribute__ ((interrupt))
void
f (struct interrupt_frame *frame)
{
}

and you must define struct interrupt_frame as described in the processors manual.

Exception handlers differ from interrupt handlers because the system pushes an error code on the stack. An exception handler declaration is similar to that for an interrupt handler, but with a different mandatory function signature. The compiler arranges to pop the error code off the stack before the IRET instruction.

#ifdef __x86_64__
typedef unsigned long long int uword_t;
#else
typedef unsigned int uword_t;
#endif

struct interrupt_frame;

__attribute__ ((interrupt))
void
f (struct interrupt_frame *frame, uword_t error_code)
{
  ...
}

Exception handlers should only be used for exceptions that push an error code; you should use an interrupt handler in other cases. The system will crash if the wrong kind of handler is used.

target (options)

As discussed in Common Function Attributes, this attribute allows specification of target-specific compilation options.

On the x86, the following options are allowed:

3dnow
no-3dnow: Enable/disable the generation of the 3DNow! instructions.
3dnowa
no-3dnowa: Enable/disable the generation of the enhanced 3DNow! instructions.
abm
no-abm: Enable/disable the generation of the advanced bit instructions.
adx
no-adx: Enable/disable the generation of the ADX instructions.
aes
no-aes: Enable/disable the generation of the AES instructions.
avx
no-avx: Enable/disable the generation of the AVX instructions.
avx2
no-avx2: Enable/disable the generation of the AVX2 instructions.
avx5124fmaps
no-avx5124fmaps: Enable/disable the generation of the AVX5124FMAPS instructions.
avx5124vnniw
no-avx5124vnniw: Enable/disable the generation of the AVX5124VNNIW instructions.
avx512bitalg
no-avx512bitalg: Enable/disable the generation of the AVX512BITALG instructions.
avx512bw
no-avx512bw: Enable/disable the generation of the AVX512BW instructions.
avx512cd
no-avx512cd: Enable/disable the generation of the AVX512CD instructions.
avx512dq
no-avx512dq: Enable/disable the generation of the AVX512DQ instructions.
avx512er
no-avx512er: Enable/disable the generation of the AVX512ER instructions.
avx512f
no-avx512f: Enable/disable the generation of the AVX512F instructions.
avx512ifma
no-avx512ifma: Enable/disable the generation of the AVX512IFMA instructions.
avx512pf
no-avx512pf: Enable/disable the generation of the AVX512PF instructions.
avx512vbmi
no-avx512vbmi: Enable/disable the generation of the AVX512VBMI instructions.
avx512vbmi2
no-avx512vbmi2: Enable/disable the generation of the AVX512VBMI2 instructions.
avx512vl
no-avx512vl: Enable/disable the generation of the AVX512VL instructions.
avx512vnni
no-avx512vnni: Enable/disable the generation of the AVX512VNNI instructions.
avx512vpopcntdq
no-avx512vpopcntdq: Enable/disable the generation of the AVX512VPOPCNTDQ instructions.
bmi
no-bmi: Enable/disable the generation of the BMI instructions.
bmi2
no-bmi2: Enable/disable the generation of the BMI2 instructions.
cldemote
no-cldemote: Enable/disable the generation of the CLDEMOTE instructions.
clflushopt
no-clflushopt: Enable/disable the generation of the CLFLUSHOPT instructions.
clwb
no-clwb: Enable/disable the generation of the CLWB instructions.
clzero
no-clzero: Enable/disable the generation of the CLZERO instructions.
crc32
no-crc32: Enable/disable the generation of the CRC32 instructions.
cx16
no-cx16: Enable/disable the generation of the CMPXCHG16B instructions.
default: See Function Multiversioning, where it is used to specify the default function version.
f16c
no-f16c: Enable/disable the generation of the F16C instructions.
fma
no-fma: Enable/disable the generation of the FMA instructions.
fma4
no-fma4: Enable/disable the generation of the FMA4 instructions.
fsgsbase
no-fsgsbase: Enable/disable the generation of the FSGSBASE instructions.
fxsr
no-fxsr: Enable/disable the generation of the FXSR instructions.
gfni
no-gfni: Enable/disable the generation of the GFNI instructions.
hle
no-hle: Enable/disable the generation of the HLE instruction prefixes.
lwp
no-lwp: Enable/disable the generation of the LWP instructions.
lzcnt
no-lzcnt: Enable/disable the generation of the LZCNT instructions.
mmx
no-mmx: Enable/disable the generation of the MMX instructions.
movbe
no-movbe: Enable/disable the generation of the MOVBE instructions.
movdir64b
no-movdir64b: Enable/disable the generation of the MOVDIR64B instructions.
movdiri
no-movdiri: Enable/disable the generation of the MOVDIRI instructions.
mwait
no-mwait: Enable/disable the generation of the MWAIT and MONITOR instructions.
mwaitx
no-mwaitx: Enable/disable the generation of the MWAITX instructions.
pclmul
no-pclmul: Enable/disable the generation of the PCLMUL instructions.
pconfig
no-pconfig: Enable/disable the generation of the PCONFIG instructions.
pku
no-pku: Enable/disable the generation of the PKU instructions.
popcnt
no-popcnt: Enable/disable the generation of the POPCNT instruction.
prefetchwt1
no-prefetchwt1: Enable/disable the generation of the PREFETCHWT1 instructions.
prfchw
no-prfchw: Enable/disable the generation of the PREFETCHW instruction.
ptwrite
no-ptwrite: Enable/disable the generation of the PTWRITE instructions.
rdpid
no-rdpid: Enable/disable the generation of the RDPID instructions.
rdrnd
no-rdrnd: Enable/disable the generation of the RDRND instructions.
rdseed
no-rdseed: Enable/disable the generation of the RDSEED instructions.
rtm
no-rtm: Enable/disable the generation of the RTM instructions.
sahf
no-sahf: Enable/disable the generation of the SAHF instructions.
sgx
no-sgx: Enable/disable the generation of the SGX instructions.
sha
no-sha: Enable/disable the generation of the SHA instructions.
shstk
no-shstk: Enable/disable the shadow stack built-in functions from CET.
sse
no-sse: Enable/disable the generation of the SSE instructions.
sse2
no-sse2: Enable/disable the generation of the SSE2 instructions.
sse3
no-sse3: Enable/disable the generation of the SSE3 instructions.
sse4
no-sse4: Enable/disable the generation of the SSE4 instructions (both SSE4.1 and SSE4.2).
sse4.1
no-sse4.1: Enable/disable the generation of the SSE4.1 instructions.
sse4.2
no-sse4.2: Enable/disable the generation of the SSE4.2 instructions.
sse4a
no-sse4a: Enable/disable the generation of the SSE4A instructions.
ssse3
no-ssse3: Enable/disable the generation of the SSSE3 instructions.
tbm
no-tbm: Enable/disable the generation of the TBM instructions.
vaes
no-vaes: Enable/disable the generation of the VAES instructions.
vpclmulqdq
no-vpclmulqdq: Enable/disable the generation of the VPCLMULQDQ instructions.
waitpkg
no-waitpkg: Enable/disable the generation of the WAITPKG instructions.
wbnoinvd
no-wbnoinvd: Enable/disable the generation of the WBNOINVD instructions.
xop
no-xop: Enable/disable the generation of the XOP instructions.
xsave
no-xsave: Enable/disable the generation of the XSAVE instructions.
xsavec
no-xsavec: Enable/disable the generation of the XSAVEC instructions.
xsaveopt
no-xsaveopt: Enable/disable the generation of the XSAVEOPT instructions.
xsaves
no-xsaves: Enable/disable the generation of the XSAVES instructions.
amx-tile
no-amx-tile: Enable/disable the generation of the AMX-TILE instructions.
amx-int8
no-amx-int8: Enable/disable the generation of the AMX-INT8 instructions.
amx-bf16
no-amx-bf16: Enable/disable the generation of the AMX-BF16 instructions.
uintr
no-uintr: Enable/disable the generation of the UINTR instructions.
hreset
no-hreset: Enable/disable the generation of the HRESET instruction.
kl
no-kl: Enable/disable the generation of the KEYLOCKER instructions.
widekl
no-widekl: Enable/disable the generation of the WIDEKL instructions.
avxvnni
no-avxvnni: Enable/disable the generation of the AVXVNNI instructions.
avxifma
no-avxifma: Enable/disable the generation of the AVXIFMA instructions.
avxvnniint8
no-avxvnniint8: Enable/disable the generation of the AVXVNNIINT8 instructions.
avxneconvert
no-avxneconvert: Enable/disable the generation of the AVXNECONVERT instructions.
cmpccxadd
no-cmpccxadd: Enable/disable the generation of the CMPccXADD instructions.
amx-fp16
no-amx-fp16: Enable/disable the generation of the AMX-FP16 instructions.
prefetchi
no-prefetchi: Enable/disable the generation of the PREFETCHI instructions.
raoint
no-raoint: Enable/disable the generation of the RAOINT instructions.
amx-complex
no-amx-complex: Enable/disable the generation of the AMX-COMPLEX instructions.
avxvnniint16
no-avxvnniint16: Enable/disable the generation of the AVXVNNIINT16 instructions.
sm3
no-sm3: Enable/disable the generation of the SM3 instructions.
sha512
no-sha512: Enable/disable the generation of the SHA512 instructions.
sm4
no-sm4: Enable/disable the generation of the SM4 instructions.
usermsr
no-usermsr: Enable/disable the generation of the USER_MSR instructions.
apxf
no-apxf: Enable/disable the generation of the APX features, including EGPR, PUSH2POP2, NDD and PPX.
avx10.1-256
no-avx10.1-256: Enable the generation of the AVX10.1 instructions with 256 bit support. Disable the generation of the AVX10.1 instructions.
avx10.1-512
no-avx10.1-512: Enable the generation of the AVX10.1 instructions with 512 bit support. Disable the generation of the AVX10.1 instructions.
avx10.1
no-avx10.1: Enable the generation of the AVX10.1 instructions with 512 bit support. Disable the generation of the AVX10.1 instructions.
cld
no-cld: Enable/disable the generation of the CLD before string moves.
fancy-math-387
no-fancy-math-387: Enable/disable the generation of the sin, cos, and sqrt instructions on the 387 floating-point unit.
ieee-fp
no-ieee-fp: Enable/disable the generation of floating point that depends on IEEE arithmetic.
inline-all-stringops
no-inline-all-stringops: Enable/disable inlining of string operations.
inline-stringops-dynamically
no-inline-stringops-dynamically: Enable/disable the generation of the inline code to do small string operations and calling the library routines for large operations.
align-stringops
no-align-stringops: Do/do not align destination of inlined string operations.
recip
no-recip: Enable/disable the generation of RCPSS, RCPPS, RSQRTSS and RSQRTPS instructions followed an additional Newton-Raphson step instead of doing a floating-point division.
general-regs-only: Generate code which uses only the general registers.
arch=ARCH: Specify the architecture to generate code for in compiling the function.
tune=TUNE: Specify the architecture to tune for in compiling the function.
fpmath=FPMATH: Specify which floating-point unit to use. You must specify the target("fpmath=sse,387") option as target("fpmath=sse+387") because the comma would separate different options.
prefer-vector-width=OPT: On x86 targets, the prefer-vector-width attribute informs the compiler to use OPT-bit vector width in instructions instead of the default on the selected platform.

Valid OPT values are:

none

No extra limitations applied to GCC other than defined by the selected platform.

128

Prefer 128-bit vector width for instructions.

256

Prefer 256-bit vector width for instructions.

512

Prefer 512-bit vector width for instructions.

indirect_branch("choice")

On x86 targets, the indirect_branch attribute causes the compiler to convert indirect call and jump with choice. keep keeps indirect call and jump unmodified. thunk converts indirect call and jump to call and return thunk. thunk-inline converts indirect call and jump to inlined call and return thunk. thunk-extern converts indirect call and jump to external call and return thunk provided in a separate object file.

function_return("choice")

On x86 targets, the function_return attribute causes the compiler to convert function return with choice. keep keeps function return unmodified. thunk converts function return to call and return thunk. thunk-inline converts function return to inlined call and return thunk. thunk-extern converts function return to external call and return thunk provided in a separate object file.

nocf_check

The nocf_check attribute on a function is used to inform the compiler that the functions prologue should not be instrumented when compiled with the -fcf-protection=branch option. The compiler assumes that the functions address is a valid target for a control-flow transfer.

The nocf_check attribute on a type of pointer to function is used to inform the compiler that a call through the pointer should not be instrumented when compiled with the -fcf-protection=branch option. The compiler assumes that the functions address from the pointer is a valid target for a control-flow transfer. A direct function call through a function name is assumed to be a safe call thus direct calls are not instrumented by the compiler.

The nocf_check attribute is applied to an objects type. In case of assignment of a function address or a function pointer to another pointer, the attribute is not carried over from the right-hand objects type; the type of left-hand object stays unchanged. The compiler checks for nocf_check attribute mismatch and reports a warning in case of mismatch.

{
int foo (void) __attribute__(nocf_check);
void (*foo1)(void) __attribute__(nocf_check);
void (*foo2)(void);

/* foo's address is assumed to be valid.  */
int
foo (void) 

  /* This call site is not checked for control-flow 
     validity.  */
  (*foo1)();

  /* A warning is issued about attribute mismatch.  */
  foo1 = foo2; 

  /* This call site is still not checked.  */
  (*foo1)();

  /* This call site is checked.  */
  (*foo2)();

  /* A warning is issued about attribute mismatch.  */
  foo2 = foo1; 

  /* This call site is still checked.  */
  (*foo2)();

  return 0;
}

cf_check

The cf_check attribute on a function is used to inform the compiler that ENDBR instruction should be placed at the function entry when -fcf-protection=branch is enabled.

indirect_return

The indirect_return attribute can be applied to a function, as well as variable or type of function pointer to inform the compiler that the function may return via indirect branch.

fentry_name("name")

On x86 targets, the fentry_name attribute sets the function to call on function entry when function instrumentation is enabled with -pg -mfentry. When name is nop then a 5 byte nop sequence is generated.

fentry_section("name")

On x86 targets, the fentry_section attribute sets the name of the section to record function entry instrumentation calls in when enabled with -pg -mrecord-mcount

nodirect_extern_access

This attribute, attached to a global variable or function, is the counterpart to option -mno-direct-extern-access.

6.33.33.1 Inlining rules

On the x86, the inliner does not inline a function that has different target options than the caller, unless the callee has a subset of the target options of the caller. For example a function declared with target("sse3") can inline a function with target("sse2"), since -msse3 implies -msse2.

Besides the basic rule, when a function specifies target("arch=ARCH") or target("tune=TUNE") attribute, the inlining rule will be different. It allows inlining of a function with default -march=x86-64 and -mtune=generic specified, or a function that has a subset of ISA features and marked with always_inline.