ARM (LD)

6.3 `ld` and the ARM family

For the ARM, ld will generate code stubs to allow functions calls between ARM and Thumb code. These stubs only work with code that has been compiled and assembled with the -mthumb-interwork command line option. If it is necessary to link with old ARM object files or libraries, which have not been compiled with the -mthumb-interwork option then the --support-old-code command-line switch should be given to the linker. This will make it generate larger stub functions which will work with non-interworking aware ARM code. Note, however, the linker does not support generating stubs for function calls to non-interworking aware Thumb code.

The --thumb-entry switch is a duplicate of the generic --entry switch, in that it sets the programs starting address. But it also sets the bottom bit of the address, so that it can be branched to using a BX instruction, and the program will start executing in Thumb mode straight away.

The --use-nul-prefixed-import-tables switch is specifying, that the import tables idata4 and idata5 have to be generated with a zero element prefix for import libraries. This is the old style to generate import tables. By default this option is turned off.

The --be8 switch instructs ld to generate BE8 format executables. This option is only valid when linking big-endian objects - ie ones which have been assembled with the -EB option. The resulting image will contain big-endian data and little-endian code.

The R_ARM_TARGET1 relocation is typically used for entries in the .init_array section. It is interpreted as either R_ARM_REL32 or R_ARM_ABS32, depending on the target. The --target1-rel and --target1-abs switches override the default.

The --target2=type switch overrides the default definition of the R_ARM_TARGET2 relocation. Valid values for type, their meanings, and target defaults are as follows:

rel: R_ARM_REL32 (arm*-*-elf, arm*-*-eabi)
abs: R_ARM_ABS32
got-rel: R_ARM_GOT_PREL (arm*-*-linux, arm*-*-*bsd)

The R_ARM_V4BX relocation (defined by the ARM AAELF specification) enables objects compiled for the ARMv4 architecture to be interworking-safe when linked with other objects compiled for ARMv4t, but also allows pure ARMv4 binaries to be built from the same ARMv4 objects.

In the latter case, the switch --fix-v4bx must be passed to the linker, which causes v4t BX rM instructions to be rewritten as MOV PC,rM, since v4 processors do not have a BX instruction.

In the former case, the switch should not be used, and R_ARM_V4BX relocations are ignored.

Replace BX rM instructions identified by R_ARM_V4BX relocations with a branch to the following veneer:

TST rM, #1
MOVEQ PC, rM
BX Rn

This allows generation of libraries/applications that work on ARMv4 cores and are still interworking safe. Note that the above veneer clobbers the condition flags, so may cause incorrect program behavior in rare cases.

The --use-blx switch enables the linker to use ARM/Thumb BLX instructions (available on ARMv5t and above) in various situations. Currently it is used to perform calls via the PLT from Thumb code using BLX rather than using BX and a mode-switching stub before each PLT entry. This should lead to such calls executing slightly faster.

The --vfp11-denorm-fix switch enables a link-time workaround for a bug in certain VFP11 coprocessor hardware, which sometimes allows instructions with denorm operands (which must be handled by support code) to have those operands overwritten by subsequent instructions before the support code can read the intended values.

The bug may be avoided in scalar mode if you allow at least one intervening instruction between a VFP11 instruction which uses a register and another instruction which writes to the same register, or at least two intervening instructions if vector mode is in use. The bug only affects full-compliance floating-point mode: you do not need this workaround if you are using "runfast" mode. Please contact ARM for further details.

If you know you are using buggy VFP11 hardware, you can enable this workaround by specifying the linker option --vfp-denorm-fix=scalar if you are using the VFP11 scalar mode only, or --vfp-denorm-fix=vector if you are using vector mode (the latter also works for scalar code). The default is --vfp-denorm-fix=none.

If the workaround is enabled, instructions are scanned for potentially-troublesome sequences, and a veneer is created for each such sequence which may trigger the erratum. The veneer consists of the first instruction of the sequence and a branch back to the subsequent instruction. The original instruction is then replaced with a branch to the veneer. The extra cycles required to call and return from the veneer are sufficient to avoid the erratum in both the scalar and vector cases.

The --fix-arm1176 switch enables a link-time workaround for an erratum in certain ARM1176 processors. The workaround is enabled by default if you are targeting ARM v6 (excluding ARM v6T2) or earlier. It can be disabled unconditionally by specifying --no-fix-arm1176.

Further information is available in ARM1176JZ-S and ARM1176JZF-S Programmer Advice Notice (UAN0002) available on the Arm documentation website at https://developer.arm.com/.

The --fix-stm32l4xx-629360 switch enables a link-time workaround for a bug in the bus matrix / memory controller for some of the STM32 Cortex-M4 based products (STM32L4xx). When accessing off-chip memory via the affected bus for bus reads of 9 words or more, the bus can generate corrupt data and/or abort. These are only core-initiated accesses (not DMA), and might affect any access: integer loads such as LDM, POP and floating-point loads such as VLDM, VPOP. Stores are not affected.

The bug can be avoided by splitting memory accesses into the necessary chunks to keep bus reads below 8 words.

The workaround is not enabled by default, this is equivalent to use --fix-stm32l4xx-629360=none. If you know you are using buggy STM32L4xx hardware, you can enable the workaround by specifying the linker option --fix-stm32l4xx-629360, or the equivalent --fix-stm32l4xx-629360=default.

If the workaround is enabled, instructions are scanned for potentially-troublesome sequences, and a veneer is created for each such sequence which may trigger the erratum. The veneer consists in a replacement sequence emulating the behaviour of the original one and a branch back to the subsequent instruction. The original instruction is then replaced with a branch to the veneer.

The workaround does not always preserve the memory access order for the LDMDB instruction, when the instruction loads the PC.

The workaround is not able to handle problematic instructions when they are in the middle of an IT block, since a branch is not allowed there. In that case, the linker reports a warning and no replacement occurs.

The workaround is not able to replace problematic instructions with a PC-relative branch instruction if the .text section is too large. In that case, when the branch that replaces the original code cannot be encoded, the linker reports a warning and no replacement occurs.

The --no-enum-size-warning switch prevents the linker from warning when linking object files that specify incompatible EABI enumeration size attributes. For example, with this switch enabled, linking of an object file using 32-bit enumeration values with another using enumeration values fitted into the smallest possible space will not be diagnosed.

The --no-wchar-size-warning switch prevents the linker from warning when linking object files that specify incompatible EABI wchar_t size attributes. For example, with this switch enabled, linking of an object file using 32-bit wchar_t values with another using 16-bit wchar_t values will not be diagnosed.

The --pic-veneer switch makes the linker use PIC sequences for ARM/Thumb interworking veneers, even if the rest of the binary is not PIC. This avoids problems on uClinux targets where --emit-relocs is used to generate relocatable binaries.

The linker will automatically generate and insert small sequences of code into a linked ARM ELF executable whenever an attempt is made to perform a function call to a symbol that is too far away. The placement of these sequences of instructions - called stubs - is controlled by the command-line option --stub-group-size=N. The placement is important because a poor choice can create a need for duplicate stubs, increasing the code size. The linker will try to group stubs together in order to reduce interruptions to the flow of code, but it needs guidance as to how big these groups should be and where they should be placed.

The value of N, the parameter to the --stub-group-size= option controls where the stub groups are placed. If it is negative then all stubs are placed after the first branch that needs them. If it is positive then the stubs can be placed either before or after the branches that need them. If the value of N is 1 (either +1 or -1) then the linker will choose exactly where to place groups of stubs, using its built in heuristics. A value of N greater than 1 (or smaller than -1) tells the linker that a single group of stubs can service at most N bytes from the input sections.

The default, if --stub-group-size= is not specified, is N = +1.

Farcalls stubs insertion is fully supported for the ARM-EABI target only, because it relies on object files properties not present otherwise.

The --fix-cortex-a8 switch enables a link-time workaround for an erratum in certain Cortex-A8 processors. The workaround is enabled by default if you are targeting the ARM v7-A architecture profile. It can be enabled otherwise by specifying --fix-cortex-a8, or disabled unconditionally by specifying --no-fix-cortex-a8.

The erratum only affects Thumb-2 code. Please contact ARM for further details.

The --fix-cortex-a53-835769 switch enables a link-time workaround for erratum 835769 present on certain early revisions of Cortex-A53 processors. The workaround is disabled by default. It can be enabled by specifying --fix-cortex-a53-835769, or disabled unconditionally by specifying --no-fix-cortex-a53-835769.

Please contact ARM for further details.

The --no-merge-exidx-entries switch disables the merging of adjacent exidx entries in debuginfo.

The --long-plt option enables the use of 16 byte PLT entries which support up to 4Gb of code. The default is to use 12 byte PLT entries which only support 512Mb of code.

The --no-apply-dynamic-relocs option makes AArch64 linker do not apply link-time values for dynamic relocations.

All SG veneers are placed in the special output section .gnu.sgstubs. Its start address must be set, either with the command-line option --section-start or in a linker script, to indicate where to place these veneers in memory.

The --cmse-implib option requests that the import libraries specified by the --out-implib and --in-implib options are secure gateway import libraries, suitable for linking a non-secure executable against secure code as per ARMv8-M Security Extensions.

The --in-implib=file specifies an input import library whose symbols must keep the same address in the executable being produced. A warning is given if no --out-implib is given but new symbols have been introduced in the executable that should be listed in its import library. Otherwise, if --out-implib is specified, the symbols are added to the output import library. A warning is also given if some symbols present in the input import library have disappeared from the executable. This option is only effective for Secure Gateway import libraries, ie. when --cmse-implib is specified.

The -z force-bti option turns on the verification of Branch Target Identification (BTI) in input objects, generates PLTs with BTI, and marks the output with BTI. If this option is omitted, but all input objects belonging to the link unit have the BTI marking, the linker implicitly generates PLTs with BTI, and marks the output with BTI.

The -z bti-report[=none|warning|error] option specifies how to report missing BTI markings on inputs, i.e. the GNU_PROPERTY_AARCH64_FEATURE_1_BTI property. By default, if the option is omitted and -z force-bti is provided, warnings are emitted.

none disables any warning messages.
warning (the default value) emits warning messages when input objects composing the link unit are missing BTI markings.
error turns the warning messages into errors.

If issues are found, a maximum of 20 messages will be emitted, and then a summary with the total number of issues will be displayed at the end.

The -z pac-plt option enables the usage of pointer authentication in PLTs.

The -z gcs option controls the verification of Guarded Control Stack (GCS) markings on input objects and marks the output with GCS if all conditions are validated.

implicit (default if -z gcs is omitted) enables GCS marking on the output if, and only if, all input objects composing the link unit are marked with GCS.
always forces the marking of the output with GCS.
never ignores any GCS marking on the input objects, and does not mark the output with GCS.

The -z gcs-report[=none|warning|error] option specifies how to report the missing GCS markings on inputs, i.e. the GNU_PROPERTY_AARCH64_FEATURE_1_GCS property. By default, if the option is omitted and -z gcs is provided, warnings are emitted.

none disables any warning messages.
warning (the default value) emits warning messages when input objects composing the link unit are missing GCS markings.
error turns the warning messages into errors.

If issues are found, a maximum of 20 messages will be emitted, and then a summary with the total number of issues will be displayed at the end.

The -z gcs-report-dynamic=none|warning|error option specifies how to report the missing GCS markings on dynamic input objects, i.e. the GNU_PROPERTY_AARCH64_FEATURE_1_GCS property. By default, if the option is omitted, it inherits the value of -z gcs-report. However, the inherited value is capped to warning as some user might want to only report errors in the currently built module, and not the shared dependencies. It is therefore necessary to use an explicit -z gcs-report-dynamic=error option if you want the linker to error on GCS issues in the shared libraries.

none disables any warning messages.
warning emits warning messages when dynamic objects are missing GCS markings.
error turns the warning messages into errors.

If issues are found, a maximum of 20 messages will be emitted, and then a summary with the total number of issues will be displayed at the end.

The -z memtag-mode=mode specifies the MTE mode of operation. The value of mode can be one of none, sync or async. The specified modes determine the value of the DT_AARCH64_MEMTAG_MODE dynamic tag. The sync mode implies precise exceptions, with the runtime providing the exact instruction where the fault occurred, and the exact faulting address. The async mode implies imprecise exceptions.

The -z memtag-stack specifies that output object uses MTE instructions for stack memory usage.

6.3 ld and the ARM family

6.3 `ld` and the ARM family