STM32H7&PX4 | LiamY Blog

type

status

slug

date

summary

先从总线结构和内存看起

它的总线很奇怪，首先就是多的这个AXI BUS-MATRIX, 这是个64位的总线矩阵

由于H7中大量使用了SPI协议，我们这里单独来详细说一下这个协议：

SPI协议

参考文章：

Quad-SPI, Everything You Need To Know!

In this article, let us look at the Quad-SPI interface, and learn how it works! If you have just mastered this SPI interface, then looking at Dual and Quad SPI can be overwhelming. I had to read several pdfs to wrap my mind around this concept. In this article, I am presenting all the useful ... Read more

https://embeddedinventor.com/quad-spi-everything-you-need-to-know/

Basics of the SPI Communication Protocol

SPI is a communication protocol used to interface a variety of sensors and modules to microcontrollers. This easy to understand guide will explain how it works.

https://www.circuitbasics.com/basics-of-the-spi-communication-protocol/

一文看懂SPI协议 - 王超的独立博客-电子电路开发爱好者单片机嵌入式 Qt 物联网智能硬件

王超的独立博客

https://www.wangchaochao.top/2021/02/25/about-spi/

一般由SCLK、CS、MOSI，MISO四根线组成，有的地方可能是：SCK、SS、SDI、SDO等名称。

SPI—Serial Peripheral Interface(SPI)

Devices communicating via SPI are in a master-slave relationship. The master is the controlling device (usually a microcontroller), while the slave (usually a sensor, display, or memory chip) takes instruction from the master. The simplest configuration of SPI is a single master, single slave system, but one master can control more than one slave (more on this below).

MOSI (Master Output/Slave Input) – Line for the master to send data to the slave.

MISO (Master Input/Slave Output) – Line for the slave to send data to the master.

SCLK (Clock) – Line for the clock signal.

SS/CS (Slave Select/Chip Select) – Line for the master to select which slave to send data to.

The clock signal synchronizes the output of data bits from the master to the sampling of bits by the slave. One bit of data is transferred in each clock cycle, so the speed of data transfer is determined by the frequency of the clock signal. SPI communication is always initiated by the master since the master configures and generates the clock signal.

Any communication protocol where devices share a clock signal is known as synchronous. SPI is a synchronous communication protocol. There are also asynchronous methods that don’t use a clock signal. For example, in UART communication, both sides are set to a pre-configured baud rate that dictates the speed and timing of data transmission.

In practice, the number of slaves is limited by the load capacitance of the system, which reduces the ability of the master to accurately switch between voltage levels.

看看SPI的多从情况，这个是依靠SS/cs这个信号线做到的

If the master has multiple slave select pins, the slaves can be wired in parallel like this:

If only one slave select pin is available, the slaves can be daisy-chained(菊连) like this:

菊花是什么含义呢？注意绿线，slave处的MISO，是MISO到MOSI最后一根蓝线MISO到主机的MISO

还有一个是数据传输的大小端传统：

Data sent from the master to the slave is usually sent with the most significant bit first. （高位先，大端传输）

The data sent from the slave back to the master is usually sent with the least significant bit first. （低位先，小端传输）

传输过程：

ADVANTAGES

No start and stop bits, so the data can be streamed continuously without interruption

No complicated slave addressing system like I2C

Higher data transfer rate than I2C (almost twice as fast)

Separate MISO and MOSI lines, so data can be sent and received at the same time

DISADVANTAGES

Uses four wires (I2C and UARTs use two)

No acknowledgement that the data has been successfully received (I2C has this)

No form of error checking like the parity bit in UART

Only allows for a single master

那这个SPI看完后也不过如此，我们进阶看看Quad-SPI吧（dual-SPI的变化不大，只不过改为了半双工，将原来的MOSI MISO改成IO0 IO1这样master发送时双倍速，slave也是，但是无法master和slave同时发送，当然Quad-SPI也有这样的限制）

Quad-SPI

Quad-SPI, also known as QSPI, is a peripheral that can be found in most modern microcontrollers. It has been specifically designed for talking to flash chips that support this interface. It is especially useful in applications that involve a lot of memory-intensive data like multimedia and on-chip memory is not enough. It can be also used to store code externally and it has the ability to make the external memory behave as fast as the internal memory through some special mechanisms.

It is faster than traditional SPI as quad-SPI uses 4 data lines (I0, I1, I2, and I3) as opposed to just 2 data lines (MOSI and MISO) on the traditional SPI.

目的是解决外挂Flash传输速度慢，无法只靠一根data line达到高速的传输：

What solution did we use before quad SPI? Earlier before quad-SPI came, the solution was to use parallel memory where 8, 16 or 32 pins (depending on the address range) can be used to connect the external memory device with the microcontroller to achieve fast performance. But this approach had 2 major cons

It made the PCB design complicated.

It also meant that all these pins are fixed to one particular chip and cannot be used anymore for literally anything else.

Due to all of these problems, engineers needed to come up with a proper solution for making flash faster and the solution they came up with is to just modify the SPI protocols to use 2 more data lines and make all 4 data lines bidirectional and they named it Quad-SPI.

The figure shows the typical stages of a Quad-SPI exchange. First, the instruction is sent over the IO lines. Followed by the address and then comes the Alt field which can be implemented the way the manufacturer of the flash memory wants it to be. Then for a brief period, 2 clock cycles in the above figure, the transmission is paused to allow for changing the direction of the I/O line. Then the data is sent from the flash device to the microcontroller.

As you can see 4 bits are transferred every clock cycle. The bit order as you can see is IO0 sends bit0, IO1 sends bit1 and so on in the first clock cycle, and bits 4,5,6, and 7 are sent out in the 2nd clock cycle. Thus in just 2 clock cycles, the entire byte is transmitted!

后面有关于Double Data Rate和XIP的技术介绍，很有意思：

从原来一个时钟周期传1bit变成一个时钟周期的上升沿下降沿都能触发data改变，从而double了数据传输

XIP：

XIP stands for eXecute In Place, it is a feature which allows the microcontroller to execute code straight from the external flash memory without copying it first. This allows for faster and more efficient execution of code.

When the code size gets too big to be stored in the on-chip storage, we usually go for external memory, but the problem with external memory used to be the fact that it was very slow to access. But using Quad-SPI mode and a prefetch mechanism, the data retrieval speed of external flash devices can be made comparable to the on-chip storage and hence can be used to not just store some databases and multimedia but it can be used to execute code too.

回到stm32F7的内存布局：

这个时候就要了解一下它不一样的bootLoader了

Boot modes

At startup, the boot memory space is selected by the BOOT pin and BOOT_ADDx option bytes, allowing to program any boot memory address from 0x0000 0000 to 0x3FFF FFFF which includes: • All flash address space • All RAM address space: ITCM, DTCM RAMs and SRAMs • The System memory bootloader The boot loader is located in non-user System memory. It is used to reprogram the flash memory through a serial interface (USART, I2C, SPI, USB-DFU). Refer to STM32 microcontroller System memory Boot mode application note (AN2606) for details

埋一个坑，这里对于BOOT_ADD0/1的定义和重要：MSB(most significant bytes) of the Arm Cortex-M7 boot address when BOOT pin is low(respectively

这里我挂上一个博文，它是采用代码编写，在程序中去执行到stm32的bootloader来实现bootloader烧录的。这里我想要强调的是它使用的地址，不是我们想的bootloader放置在system memory的这个memory的起始地址—0x1FF0 0000而是0x1FF0 9800，为什么呢？我们需要看an2606文件来找到答案

【STM32H7教程】第68章 STM32H7的系统bootloader之USB DFU方式固件升级_st327刷机固件-CSDN博客

文章浏览阅读2.8k次。完整教程下载地址：http://www.armbbs.cn/forum.php?mod=viewthread&tid=86980第68章 STM32H7的系统bootloader之USB DFU方式固件升级本章节为大家讲解使用系统bootloader做程序升级的方法，即使不依赖外部boot引脚也可以方便升级。DFU的全称是Device Firmware Upgrad..._st327刷机固件

https://blog.csdn.net/Simon223/article/details/104657167

我们再结合触发条件Pattern 10来看看：

于是乎我们知道，要么设置BOOT_ADD0/1为0x1FF0

FLASH部分

摘取重要（对我）信息

The embedded flash memory manages the automatic loading of non-volatile user option bytes at power-on reset, and implements the dynamic update of these options.

2 Mbytes of non-volatile memory divided into two banks of 1 Mbyte each

这里的双bank flash块就是

H7中已经没有F1和F4系列中的ART Chrome加速，通过H7中的Cache加速即可。具体延迟数值和主频关系如下：

42. 读写内部FLASH — [野火]STM32 HAL库开发实战指南——基于H743繁星开发板文档

H743-繁星是一款高级的电机控制主板，主控为STM32H43IIT6，176个引脚,整板资源非常丰富，是一款性能强劲的全功能的学习评估板。2个直流有刷/无刷电机接口、4个步进电机接口、2个舵机接口6路模拟输入接口、3个4路隔离输入接口、1个4路隔离输出接口、1个编码器输入接口、1路CAN、1路485、1路MAX232、EEPROM、SPI FLASH、以太网、USB HOST、USB转串口、RTC座、SWD下载调试接口、LCD接口、串口屏接口，硬件资源丰富全面适合学生学习、企业产品调研并批量使用。

https://doc.embedfire.com/motor/h743fanxing/zh/latest/doc/chapter42/chapter42.html

用户存储区

一般我们说STM32内部FLASH的时候，都是指这个用户存储区区域，它是存储用户应用程序的空间，芯片型号说明中的1M FLASH、2M FLASH都是指这个区域的大小。如上面链接用的实验板中使用的STM32H743XIH6型号芯片，主存储器分为一块，共1MB，每块内分8个扇区，每个扇区大小为128KB。它的主存储区域大小为1MB，所以它只包含有表中的扇区0-扇区7。与其它FLASH一样，在写入数据前，要先按扇区擦除，而有的时候我们希望能以小规格操纵存储单元，所以STM32针对1MB FLASH的产品还提供了一种双块的存储格式。

这个图里面👉有个ITCM位于0x0000 0000-0x0000 FFFF区域，很好奇这个地方是什么

后缀：TCM=Tightly Coupled Memory，是一种高速缓存，”据说”(从下面的图看起来也勉强算是在CPU内部的，因为靠的很久，但是不如I-Cache和D-Cache)是被直接集成在CPU芯片中。stm32中有两种TCM，分别是ITCM（Instruction TCM）和DTCM（Data TCM）（在图中放在0x2000 0000 - 0x2001 FFFF）

ITCM是cortex内核中指令传输总线，DTCM是cortex内核中数据传输总线，是cpu内核同flash及sram之间传输指令和数据的通道，指令的取指和执行及数据的读写在性能及管理上存在差异性，因而需要予以区分。

TCM bus interface The TCM (tightly-coupled memory) is provided to connect the Cortex®-M7 to an internal RAM. The TCM interface has a Harvard architecture with ITCM (instruction TCM) and DTCM (data TCM) interfaces. The ITCM has one 64-bit memory interface while the DTCM is split into two 32-bit wide ports, D0TCM and D1TCM. The Cortex®-M7 CPU uses the 64-bit ITCM bus for fetching instructions from the ITCM and to access the data (literal pool) located in the ITCM-RAM. The ITCM is accessed by the Cortex®-M7 at CPU clock speed with zero wait state. The DTCM interface can also fetch instructions. In the STM32H72x/73x/74x/75x architecture, only the CPU and the MDMA can have access to memories connected to the ITCM and DTCM interfaces. ——from an4891-stm32h72x-stm32h73x-and-singlecore-stm32h74x75x-system-architecture-and-performance-stmicroelectronics.pdf

由于是高速缓存，所以这两块内存区域被当做特殊的用途。比如某些对时间要求非常严格的代码，就可以被放到ITCM中执行。这可以有效地提高运行速度。某些需要频繁存取的数据，也可以放到DTCM中以节省存取时间。怎么样把代码放到ITCM中？有两种方法。一种是使用gcc特有的“属性标签”，将指定代码赋予“ITCM”属性，此时该代码会被载入ITCM中执行。还有一种方法是直接将.c源文件改成.itcm.c，此时源文件会被直接编译成在ITCM中运行的目标文件。而DTCM就方便得多了。虽然两个TCM都是可映射的，也就是说，它们的地址并非固定，但是一般会将其分别映射到固定地址。既然已经有了固定地址，那么就可以很轻松地访问了。不过，正如刚才所说的，这两块内存空间都是有特殊用途的，所以不建议直接访问。相比于ITCM来说，DTCM更加重要。因为在这块内存中，存在着一个非常重要的对象——栈。局部变量和函数调用的参数，就是靠栈进行传递的。由于DMA无法访问TCM，所以也就无法访问栈。又由于局部变量是被开辟到栈中，所以DMA也无法对局部变量进行传递。

关于DMA，TCM（ITCM和DTCM）和Cache的理解-CSDN博客

文章浏览阅读5.4w次，点赞82次，收藏313次。关于DMA，TCM（ITCM和DTCM）和CacheDMADMA=Direct Memory Access。这是一种通过硬件实现的数据传输机制。简单的说，就是不在CPU的参与下完成数据的传输。DMA是一种硬件设备。这种设备的工作原理是这样的：——首先CPU告诉DMA设备，要有一堆数据需要传输，为了效率而请它出马。（DMA请求）——DMA收到CPU的消息，开始准......_dtcm