feat(newlib): riscv: add CONFIG_LIBC_OPTIMIZED_MISALIGNED_ACCESS config option

This option replaces implementations of functions from ROM:
  - memcpy
  - memcmp
  - memmove
  - str[n]cpy
  - str[n]cmp

The functions used in the firmware will be better optimized for misaligned
memory. Here are some measurements in CPU cycles for 4096-byte buffers:

  memcpy:  28676 -> 4128
  memcmp:  49147 -> 14259
  memmove: 33896 -> 8086
  strcpy:  32771 -> 17313
  strcmp:  32775 -> 13191
This commit is contained in:
Alexey Lapshin
2025-03-26 15:18:54 +07:00
parent f3625b0fb0
commit ec68cb3300
46 changed files with 1049 additions and 58 deletions

View File

@@ -87,6 +87,7 @@ The following optimizations improve the execution of nearly all code, including
:SOC_CPU_HAS_FPU: - Avoid using floating point arithmetic ``float``. Even though {IDF_TARGET_NAME} has a single precision hardware floating point unit, floating point calculations are always slower than integer calculations. If possible then use fixed point representations, a different method of integer representation, or convert part of the calculation to be integer only before switching to floating point.
:not SOC_CPU_HAS_FPU: - Avoid using floating point arithmetic ``float``. On {IDF_TARGET_NAME} these calculations are emulated in software and are very slow. If possible, use fixed point representations, a different method of integer representation, or convert part of the calculation to be integer only before switching to floating point.
- Avoid using double precision floating point arithmetic ``double``. These calculations are emulated in software and are very slow. If possible then use an integer-based representation, or single-precision floating point.
:CONFIG_ESP_ROM_HAS_SUBOPTIMAL_NEWLIB_ON_MISALIGNED_MEMORY: - Avoid misaligned 4-byte memory accesses in performance-critical code sections. For potential performance improvements, consider enabling :ref:`CONFIG_LIBC_OPTIMIZED_MISALIGNED_ACCESS`. Note that properly aligned memory operations will always execute at full speed without performance penalties. Requires additional ~190 bytes of IRAM and ~870 bytes of flash memory.
.. only:: esp32s2 or esp32s3 or esp32p4