This is an application note and contains list of examples about 2 distinct topics:
- Receiving data with UART and DMA when application does not know in advance size of bytes to be received
- Transmitting data with UART and DMA to avoid CPU stalling and use CPU for other purposes
- DMA: Direct Memory Access controller in STM32
- UART: Universal Asynchronous Receiver Transmitter
- USART: Universal Synchronous Asynchronous Receiver Transmitter
- TX: Transmit
- RX: Receive
- HT: Half-Transfer Complete flag for DMA
- TC: Transfer Complete flag for DMA
- RTO: Receiver Timeout
STM32 has peripherals such as USART and UART. Difference is that USART also has advance feature such as synchronous communication, not available in UART. For the sake of this application note, we will use term UART, while exactly same applies to USART peripherals too.
UART in STM32 allows customers to configure it using different transmit(TX)/receive(RX) modes:
- Polling mode (no DMA, no IRQ)
- Application is polling for status bits to check if any character has been transmitted/received and read it fast enough in order to not-miss any byte
- P: Easy to implement
- C: Easy to miss received characters in bursts
- C: Works only for low baudrates,
9600or lower - C: Application must periodically (with high frequency) check for new characters, usually not always possible at complex systems
- Interrupt mode (no DMA)
- UART triggers interrupt and CPU jumps to service routine to handle each received byte separately
- P: Commonly used approach in embedded applications
- P: Works well with common baudrates,
115200, up to~921600bauds - C: Interrupt service routine is executed for every received character
- C: May stall other tasks in high-performance MCUs if interrupts are triggered for every character
- C: May stall operating system when receiving burst of data, interrupt priority must be higher than operating system maximum is
- DMA mode
- DMA is used to transfer data from USART RX data register to user memory on hardware level. No application interaction is needed at this point except processing received data by application once necessary
- P: Transfer from USART peripheral to memory is done on hardware level without CPU interaction
- P: Can work very easily with operating systems
- P: Optimized for highest baudrates
> 1Mbpsand low-power applications - P: In case of big bursts of data, increasing data buffer size can improve functionality
- C: Number of bytes to transfer must be known in advance by DMA hardware
- C: If communication fails, DMA may not notify application about all bytes transferred
For RX mode, this article focuses only on DMA mode, to receive unknown number of bytes
Every STM32 have at least one (1) UART IP available and at least one (1) DMA controller.
For transmitting data, no special features on top of basic are necessary, except DMA availability. We will use default features to implement very efficient transmit system using DMA.
This is not the case for data receive operation. When implementing DMA receive, application would need to understand when (possible) burst of data received to MCU finished and react immediatelly. This is especially true when UART is used for system communication where it has to react immediately. STM32s have a capability (not all) in UART to detect when RX line has not been active for period of time. This is achieved using one of 2 available features:
- IDLE LINE: This is an event, triggered when RX line has been in idle state (normally high state) for
1frame time, after last received byte. Frame time is based on baudrate. Higher baudrate, lower frame time for single byte to be received. - RTO (Receiver Timeout): This event is triggered when line has been in idle state for programmable time. It is fully configured by UART.
Both events can trigger an interrupt.
Not all STM32 have IDLE LINE or RTO features available. When not available, examples concerning these features may not be used
An example: To transmit 1 byte at 115200 bauds, it takes approximately (for easier estimation) ~10us; for 3 bytes it would be ~30us in total. IDLE line event triggers an interrupt for application when line has been in idle state for 1 frame time (in this case 10us) after third byte has been received.
This is a real experiment demo using STM32F4 and IDLE line. After IDLE line is triggered, data are echoed back (loopback mode):
- Application receives
3bytes, takes approx25usat115200bauds - RX goes to high state (yellow rectangle) and UART RX detects that it has been idle for at least
1frame- Width of yellow rectangle represents time of
1frame
- Width of yellow rectangle represents time of
- IDLE line interrupt is triggered at green arrow
- Application echoes data back from interrupt
DMA in STM32 can be configured in normal or circular mode. For each mode, it requires number of elements to transfer before events (such as transfer complete) are triggered.
- Normal mode: DMA starts with data transfer, once it transfers all elements, it stops and sets enable bit to
0.- Application is using this mode when transmitting data
- Circular mode: DMA starts with transfer, once it transfers all elements (as written in corresponding length register), it starts from beginning of memory and can transfer more
- Applicaton is using this mode when receiving data
While transfer is active, 2 (among others) interrupts may get triggered:
- Half-Transfer complete
HT: Triggers when DMA transfers half count of elements - Transfer-Complete
TC: Triggers when DMA transfers all elements
When DMA operates in circular mode, these interrupts are triggered periodically
Number of elements to transfer by DMA hardware must be written to relevant DMA register before start of transfer
Now it is time to understand which features to use to receive data with UART and DMA to offload CPU.
As for the sake of this example, we use memory buffer array of 20 bytes. DMA will transfer data received from UART to this buffer.
Listed are steps to begin. Initial assumption is that UART has been initialized prior reaching this step, same for basic DMA setup, the rest:
- Application writes
20to relevant DMA register for data length - Application writes memory & peripheral addresses to relevant DMA registers
- Application sets DMA direction to peripheral-to-memory mode
- Application puts DMA to circular mode. This is to assure DMA does not stop transferring data after it reaches end of memory. Instead, it will roll over and continue with transferring possible more data from UART to memory
- Application enables DMA & UART in reception mode. Receive can not start & DMA will wait UART to receive first character and transmit it to array. This is done for every received byte
- Application is notified by DMA
HTevent (or interrupt) after first10have been transferred from UART to memory - Application is notified by DMA
TCevent (or interrupt) after20bytes are transferred from UART to memory - Application is notified by UART IDLE line (or RTO) in case of IDLE line or timeout detected on RX line
- Application needs to reach on all of these events for most efficient receive
This configuration is important as we do not know length in advance. Application needs to assume it may be endless number of bytes received, therefore DMA must be operational endlessly.
We have used
20bytes long array for demonstration purposes. In real app this size may need to be increased. It all depends on UART baudrate (higher speed, more data may be received in fixed window) and how fast application can process the received data (either using interrupt notification, RTOS, or polling mode)
Everything gets more simple when application transmits data, length of data is known in advance and memory to transmit is ready.
For the sake of this example, we use memory for Hello world message. In C language it would be:
const char
hello_world_arr[] = "HelloWorld";- Application writes number of bytes to transmit to relevant DMA register, that would be
strlen(hello_world_arr)or10 - Application writes memory & peripheral addresses to relevant DMA registers
- Application sets DMA direction to memory-to-peripheral mode
- Application sets DMA to normal mode. This will effectively disable DMA once al the bytes are successfully transferred
- Application enables DMA & UART in transmitter mode. Transmit starts immediately when UART requests first byte via DMA to be transferred from memory to UART TX data register
- Application is notified by
TCevent (or interrupt) after all bytes have been transmitted from memory to UART via DMA. - DMA is now stopped and application may prepare next transfer
Please note that
TCevent is triggered before last UART byte has been fully transmitted over UART. That's becauseTCevent is part of DMA and not part of UART; it is triggered when DMA transfers all the bytes from point A to point B. That is, point A for DMA is memory, point B is UART data register. Now it is up to UART to clock out byte to GPIO pin
This section describes 4 possible cases and one additional which explains why HT/TC events are necessary by application
Abbrevations used on image:
R:Read pointer, used by application to read data from memory. Later also used asold_ptrW:Write pointer, used by DMA to write next byte to. Increased every time DMA writes new byte. Later also used asnew_ptrHT:Half-Transfer Complete event triggered by DMATC:Transfer-Complete event triggered by DMAI:IDLE line detection event triggered by USART
DMA configuration:
- Circular mode
20bytes length memoryHTevent triggers at10bytesTCevent triggers at20bytes
Possible cases:
- Case A: DMA transfers
10bytes. Application gets notification byHTevent and may process received data - Case B: DMA transfers next
10bytes. Application gets notification byTCevent. Processing now starts from last known position until the end of memory- DMA is in circular mode, thus it will continue from beginning of buffer to transfer next byte
- Case C: DMA transfers
10bytes, but not aligned withHTnorTCevents- Application gets notification by
HTevent when first6bytes are transfered. Processing may start from last known read location - Application gets
IDLEevent after next4bytes are successfully transfered
- Application gets notification by
- Case D: DMA transfers
10bytes in overflow mode and but not aligned withHTnorTCevents- Application gets notification by
TCevent when first4bytes are transfered. Processing may start from last known read location - Application gets notification by
IDLEevent after next6bytes are transfered. Processing may start from beginning of buffer
- Application gets notification by
- Case E: Example what may happen when application relies only on
IDLEevent- If application receives
30bytes in burst,10bytes get overwritten by DMA as application did not process it quickly enough - Application gets
IDLEline event once there is steady RX line for1byte timeframe - Red part of data represents first
10received bytes from burst which were overwritten by last10bytes in burst - Option to avoid such scenario is to poll for DMA changes quicker than burst of
20bytes take; or by usingTCandHTevents
- If application receives
Example code to read data from memory and process it, for cases A-D
/**
* \brief Check for new data received with DMA
* \note This function must be called from DMA HT/TC and USART IDLE events
* \note Full source code is available in examples
*/
void
usart_rx_check(void) {
static size_t old_pos;
size_t pos;
/* Calculate current position in buffer */
pos = ARRAY_LEN(usart_rx_dma_buffer) - LL_DMA_GetDataLength(DMA1, LL_DMA_STREAM_1);
if (pos != old_pos) { /* Check change in received data */
if (pos > old_pos) { /* Current position is over previous one */
/* We are in "linear" mode, case P1, P2, P3 */
/* Process data directly by subtracting "pointers" */
usart_process_data(&usart_rx_dma_buffer[old_pos], pos - old_pos);
} else {
/* We are in "overflow" mode, case P4 */
/* First process data to the end of buffer */
usart_process_data(&usart_rx_dma_buffer[old_pos], ARRAY_LEN(usart_rx_dma_buffer) - old_pos);
/* Check and continue with beginning of buffer */
if (pos > 0) {
usart_process_data(&usart_rx_dma_buffer[0], pos);
}
}
}
old_pos = pos; /* Save current position as old */
/* Check and manually update if we reached end of buffer */
if (old_pos == ARRAY_LEN(usart_rx_dma_buffer)) {
old_pos = 0;
}
}Examples provide reference code to implement RX and TX functionality using DMA transfers. There are 2 sets of examples:
- Examples for RX only
- Available in
projectsfolder withusart_rx_prefix - DMA is used to receive data, polling is used to echo data back
- Available in
- Examples for RX & TX
- Available in
projectsfolder withusart_tx_prefix - DMA is used to receive data and to transmit data back
- It uses ring buffer to copy data from DMA buffer to application buffer first
- Available in
Common for all examples:
- Developed in STM32CubeIDE for easier evaluation on STM32 boards
- Fully developed using LL drivers for various STM32 families
- UART common configuration:
115200bauds,1stop bit, no-parity - DMA RX common configuration: Circular mode,
TCandHTevents enabled - DMA TX common configuration: Normal mode,
TCevent enabled - All RX examples implement loop-back with polling. Every character received by UART and transfered by DMA is sent back to same UART
| STM32 family | Board name | USART | STM32 TX | STM32 RX | RX DMA settings | TX DMA settings |
|---|---|---|---|---|---|---|
| STM32F1xx | BluePill-F103C8 |
USART1 |
PA9 |
PA10 |
DMA1, Channel 5 |
|
| STM32F4xx | NUCLEO-F413ZH |
USART3 |
PD8 |
PD9 |
DMA1, Stream 1, Channel 4 |
DMA1, Stream 3, Channel 4 |
| STM32G0xx | NUCLEO-G071RB |
USART2 |
PA2 |
PA3 |
DMA1, Channel 1 |
|
| STM32G4xx | NUCLEO-G474RE |
LPUART1 |
PA2 |
PA3 |
DMA1, Channel 1 |
|
| STM32L4xx | NUCLEO-L432KC |
USART2 |
PA2 |
PA15 |
DMA1, Channel 6, Request 2 |
Examples demonstrate different use cases for RX only or RX&TX combined.
- DMA hardware takes care to transfer received data to memory
- Application must constantly poll for new changes in DMA registers and read received data quick enough to make sure DMA will not overwrite data in buffer
- Processing of received data is in thread mode (not in interrupt)
- P: Easy to implement
- P: No interrupts, no consideration of priority and race conditions
- P: Fits for devices without USART IDLE line detection
- C: Application takes care of data periodically
- C: Not possible to put application to low-power mode (sleep mode)
- Same as polling for changes but with dedicated thread in operating system to process data
- P: Easy to implement to RTOS systems, uses single thread without additional RTOS features (no mutexes, semaphores, memory queues)
- P: No interrupts, no consideration of priority and race conditions
- P: Data processing always on-time with maximum delay given by thread delay, thus with known maximum latency between received character and processed time
- Unless system has higher priority threads
- P: Fits for devices without UART IDLE line detection
- C: Application takes care of data periodically
- C: Uses memory resources dedicated for separate thread for data processing
- C: Not possible to put application to low-power mode (sleep mode)
- Application gets notification by IDLE line detection or DMA TC/HT events
- Application has to process data only when it receives any of the
3interrupts - P: Application does not need to poll for new changes
- P: Application receives interrupts on events
- P: Application may enter low-power modes to increase battery life (if operated on battery)
- C: Data are read (processed) in the interrupt. We strive to execute interrupt routine as fast as possible
- C: Long interrupt execution may break other compatibility in the application
Processing of incoming data is from 2 interrupt vectors, hence it is important that they do not preempt each-other. Set both to the same preemption priority!
- Application gets notification by IDLE line detection or DMA TC/HT events
- Application uses separate thread to process the data only when notified in one of interrupts
- P: Processing is not in the interrupt but in separate thread
- P: Interrupt only informs processing thread to process (or to wakeup)
- P: Operating system may put processing thread to blocked state while waiting for event
- C: Memory usage for separate thread + message queue (or semaphore)
This is the most preferred way to use and process UART received character
- Application is using DMA in normal mode to transfer data
- Application is always using ringbuffer between high-level write and low-level transmit operation
- DMA TC interrupt is triggered when transfer has finished. Application can then send more data
This is a demo application available in projects folder.
Its purpose is to show how can application implement output of debug messages without drastically affect CPU performance.
It is using DMA to transfer data (no CPU to wait for UART flags) and can achieve very high or very low data rates
- All debug messages from application are written to intermediate ringbuffer
- Application will try to start & configure DMA after every successfive write to ringbuffer
- If transfer is on-going, next start is configured from DMA TC interrupt
As a result of this demo application for STm32F413-Nucleo board, observations are as following:
- Demo code sends
1581bytes every second at115200bauds, which is approx142ms. - With DMA disabled, CPU load was
14%, in-line with time to transmit the data - With DMA enabled, CPU load was
0% - DMA can be enabled/disabled with
USE_DMA_TXmacro configuration inmain.c
- run
git clone --recurse-submodules https://github.com/MaJerle/STM32_USART_DMA_RXto clone repository including submodules - run examples from
projectsdirectory using Atollic TrueSTUDIO
