type
status
slug
date
summary
tags
category
password
icon
看看SPI的多从情况,这个是依靠SS/cs这个信号线做到的
SPI can be set up to operate with a single master and a single slave, and it can be set up with multiple slaves controlled by a single master. There are two ways to connect multiple slaves to the master. If the master has multiple slave select pins, the slaves can be wired in parallel like this:
先从总线结构和内存看起
它的总线很奇怪,首先就是多的这个AXI BUS-MATRIX, 这是个64位的总线矩阵
由于H7中大量使用了SPI协议,我们这里单独来详细说一下这个协议:
SPI协议
参考文章:
一般由SCLK、CS、MOSI,MISO四根线组成,有的地方可能是:SCK、SS、SDI、SDO等名称。
SPI—Serial Peripheral Interface(SPI)
Devices communicating via SPI are in a master-slave relationship. The master is the controlling device (usually a microcontroller), while the slave (usually a sensor, display, or memory chip) takes instruction from the master. The simplest configuration of SPI is a single master, single slave system, but one master can control more than one slave (more on this below).
MOSI (Master Output/Slave Input) – Line for the master to send data to the slave.
MISO (Master Input/Slave Output) – Line for the slave to send data to the master.
SCLK (Clock) – Line for the clock signal.
SS/CS (Slave Select/Chip Select) – Line for the master to select which slave to send data to.
The clock signal synchronizes the output of data bits from the master to the sampling of bits by the slave. One bit of data is transferred in each clock cycle, so the speed of data transfer is determined by the frequency of the clock signal. SPI communication is always initiated by the master since the master configures and generates the clock signal.
Any communication protocol where devices share a clock signal is known as synchronous. SPI is a synchronous communication protocol. There are also asynchronous methods that don’t use a clock signal. For example, in UART communication, both sides are set to a pre-configured baud rate that dictates the speed and timing of data transmission.
看看SPI的多从情况,这个是依靠SS/cs这个信号线做到的
SPI can be set up to operate with a single master and a single slave, and it can be set up with multiple slaves controlled by a single master. There are two ways to connect multiple slaves to the master.
If the master has multiple slave select pins, the slaves can be wired in parallel like this:
If only one slave select pin is available, the slaves can be daisy-chained(菊连) like this:
菊花是什么含义呢? 注意绿线,slave处的MISO,是MISO到MOSI最后一根蓝线MISO到主机的MISO
还有一个是数据传输的大小端传统:
Data sent from the master to the slave is usually sent with the most significant bit first. (高位先,大端传输)
The data sent from the slave back to the master is usually sent with the least significant bit first. (低位先,小端传输)
传输过程:
ADVANTAGES
- No start and stop bits, so the data can be streamed continuously without interruption
- No complicated slave addressing system like I2C
- Higher data transfer rate than I2C (almost twice as fast)
- Separate MISO and MOSI lines, so data can be sent and received at the same time
DISADVANTAGES
- Uses four wires (I2C and UARTs use two)
- No acknowledgement that the data has been successfully received (I2C has this)
- No form of error checking like the parity bit in UART
- Only allows for a single master
那这个SPI看完后也不过如此,我们进阶看看Quad-SPI吧(dual-SPI的变化不大,只不过改为了半双工,将原来的MOSI MISO改成IO0 IO1这样master发送时双倍速,slave也是,但是无法master和slave同时发送,当然Quad-SPI也有这样的限制)
Quad-SPI
Quad-SPI, also known as QSPI, is a peripheral that can be found in most modern microcontrollers. It has been specifically designed for talking to flash chips that support this interface. It is especially useful in applications that involve a lot of memory-intensive data like multimedia and on-chip memory is not enough. It can be also used to store code externally and it has the ability to make the external memory behave as fast as the internal memory through some special mechanisms.
It is faster than traditional SPI as quad-SPI uses 4 data lines (I0, I1, I2, and I3) as opposed to just 2 data lines (MOSI and MISO) on the traditional SPI.
目的是解决外挂Flash传输速度慢,无法只靠一根data line达到高速的传输:
What solution did we use before quad SPI? Earlier before quad-SPI came, the solution was to use parallel memory where 8, 16 or 32 pins (depending on the address range) can be used to connect the external memory device with the microcontroller to achieve fast performance. But this approach had 2 major cons
- It made the PCB design complicated.
- It also meant that all these pins are fixed to one particular chip and cannot be used anymore for literally anything else.
Due to all of these problems, engineers needed to come up with a proper solution for making flash faster and the solution they came up with is to just modify the SPI protocols to use 2 more data lines and make all 4 data lines bidirectional and they named it Quad-SPI.
The figure shows the typical stages of a Quad-SPI exchange. First, the instruction is sent over the IO lines. Followed by the address and then comes the Alt field which can be implemented the way the manufacturer of the flash memory wants it to be. Then for a brief period, 2 clock cycles in the above figure, the transmission is paused to allow for changing the direction of the I/O line. Then the data is sent from the flash device to the microcontroller.
As you can see 4 bits are transferred every clock cycle. The bit order as you can see is IO0 sends bit0, IO1 sends bit1 and so on in the first clock cycle, and bits 4,5,6, and 7 are sent out in the 2nd clock cycle. Thus in just 2 clock cycles, the entire byte is transmitted!
后面有关于Double Data Rate和XIP的技术介绍,很有意思:
从原来一个时钟周期传1bit变成一个时钟周期的上升沿下降沿都能触发data改变,从而double了数据传输
XIP:
XIP stands for eXecute In Place, it is a feature which allows the microcontroller to execute code straight from the external flash memory without copying it first. This allows for faster and more efficient execution of code.
When the code size gets too big to be stored in the on-chip storage, we usually go for external memory, but the problem with external memory used to be the fact that it was very slow to access. But using Quad-SPI mode and a prefetch mechanism, the data retrieval speed of external flash devices can be made comparable to the on-chip storage and hence can be used to not just store some databases and multimedia but it can be used to execute code too.
回到stm32F7的内存布局:
这个时候就要了解一下它不一样的bootLoader了
Boot modes
At startup, the boot memory space is selected by the BOOT pin and BOOT_ADDx option bytes, allowing to program any boot memory address from 0x0000 0000 to 0x3FFF FFFF
which includes:
• All flash address space
• All RAM address space: ITCM, DTCM RAMs and SRAMs
• The System memory bootloader
The boot loader is located in non-user System memory. It is used to reprogram the flash memory through a serial interface (USART, I2C, SPI, USB-DFU). Refer to STM32 microcontroller System memory Boot mode application note (AN2606) for details
埋一个坑,这里对于BOOT_ADD0/1的定义和重要:MSB(most significant bytes) of the Arm Cortex-M7 boot address when BOOT pin is low(respectively
这里我挂上一个博文,它是采用代码编写,在程序中去执行到stm32的bootloader来实现bootloader烧录的。这里我想要强调的是它使用的地址,不是我们想的bootloader放置在system memory的这个memory的起始地址—0x1FF0 0000而是0x1FF0 9800,为什么呢?我们需要看an2606文件来找到答案
我们再结合触发条件Pattern 10来看看:
于是乎我们知道,要么设置BOOT_ADD0/1为0x1FF0
FLASH部分
摘取重要(对我)信息
The embedded flash memory manages the automatic loading of non-volatile user option
bytes at power-on reset, and implements the dynamic update of these options.
2 Mbytes of non-volatile memory divided into two banks of 1 Mbyte each
这里的双bank flash块就是
H7中已经没有F1和F4系列中的ART Chrome加速,通过H7中的Cache加速即可。具体延迟数值和主频关系如下:
- 用户存储区
一般我们说STM32内部FLASH的时候,都是指这个用户存储区区域,它是存储用户应用程序的空间,芯片型号说明中的1M FLASH、2M FLASH都是指这个区域的大小。如上面链接用的实验板中使用的STM32H743XIH6型号芯片,主存储器分为一块,共1MB,每块内分8个扇区,每个扇区大小为128KB。它的主存储区域大小为1MB,所以它只包含有表中的扇区0-扇区7。 与其它FLASH一样,在写入数据前,要先按扇区擦除,而有的时候我们希望能以小规格操纵存储单元,所以STM32针对1MB FLASH的产品还提供了一种双块的存储格式。
这个图里面👉有个ITCM位于0x0000 0000-0x0000 FFFF区域,很好奇这个地方是什么
后缀:TCM=Tightly Coupled Memory,是一种高速缓存,”据说”(从下面的图看起来也勉强算是在CPU内部的,因为靠的很久,但是不如I-Cache和D-Cache)是被直接集成在CPU芯片中。stm32中有两种TCM,分别是ITCM(Instruction TCM)和DTCM(Data TCM)(在图中放在0x2000 0000 - 0x2001 FFFF)
ITCM是cortex内核中指令传输总线,DTCM是cortex内核中数据传输总线,是cpu内核同flash及sram之间传输指令和数据的通道,指令的取指和执行及数据的读写在性能及管理上存在差异性,因而需要予以区分。
TCM bus interface
The TCM (tightly-coupled memory) is provided to connect the Cortex®-M7 to an internal
RAM. The TCM interface has a Harvard architecture with ITCM (instruction TCM) and
DTCM (data TCM) interfaces. The ITCM has one 64-bit memory interface while the DTCM
is split into two 32-bit wide ports, D0TCM and D1TCM.
The Cortex®-M7 CPU uses the 64-bit ITCM bus for fetching instructions from the ITCM and
to access the data (literal pool) located in the ITCM-RAM. The ITCM is accessed by the
Cortex®-M7 at CPU clock speed with zero wait state. The DTCM interface can also fetch
instructions.
In the STM32H72x/73x/74x/75x architecture, only the CPU and the MDMA can have access
to memories connected to the ITCM and DTCM interfaces.
——from an4891-stm32h72x-stm32h73x-and-singlecore-stm32h74x75x-system-architecture-and-performance-stmicroelectronics.pdf
由于是高速缓存,所以这两块内存区域被当做特殊的用途。比如某些对时间要求非常严格的代码,就可以被放到ITCM中执行。这可以有效地提高运行速度。某些需要频繁存取的数据,也可以放到DTCM中以节省存取时间。
怎么样把代码放到ITCM中?有两种方法。一种是使用gcc特有的“属性标签”,将指定代码赋予“ITCM”属性,此时该代码会被载入ITCM中执行。还有一种方法是直接将.c源文件改成.itcm.c,此时源文件会被直接编译成在ITCM中运行的目标文件。
而DTCM就方便得多了。虽然两个TCM都是可映射的,也就是说,它们的地址并非固定,但是一般会将其分别映射到固定地址。既然已经有了固定地址,那么就可以很轻松地访问了。不过,正如刚才所说的,这两块内存空间都是有特殊用途的,所以不建议直接访问。相比于ITCM来说,DTCM更加重要。因为在这块内存中,存在着一个非常重要的对象——栈。局部变量和函数调用的参数,就是靠栈进行传递的。由于DMA无法访问TCM,所以也就无法访问栈。又由于局部变量是被开辟到栈中,所以DMA也无法对局部变量进行传递。
- 作者:liamY
- 链接:https://liamy.clovy.top/article/linux/embeded/stm32H7
- 声明:本文采用 CC BY-NC-SA 4.0 许可协议,转载请注明出处。