Quantcast
Channel: Raspberry Pi Forums
Viewing all articles
Browse latest Browse all 4151

General • RP2350: delicate details about PIO instructions

$
0
0
Based on studies, experiences - here some details which might help others. But mainly addressed to the RP2350 design team - with a "wish list" for a few more (or modified) PIO instructions.

Using clock edges:
It is not really possible to sample in sync with a clock edge! There is always a delay (sampling a bit later as the clock edge).
Potentially, you would use such a code to wait for a falling edge in order to sample the input signals:

Code:

wait(1, gpio, 15)    #make sure the clock on GPIO15 is high, in order to wait for the falling edgewait(0, gpio, 15)    #now wait for the falling edgein_(pins, 4)             #sample QSPI QD0..QD3 at the falling edge
As you know: any instruction needs one PIO clock cycle (NOT SYSCLK cycle!) to be fetched and executed (to make it effective).
This means here:
the "wait(0, gpio, 15)" after code fetched and in execution waits really for the falling edge. PIO should continue "immediately" with the falling edge seen.
But: the "in_(pins, 4)" needs ONE PIO clock cycle to sample really (one instruction needed for instruction fetch and decode).
This means:
  • You sample the input signals ONE PIO clock cycle later! Not rally with the falling edge.
  • if your PIO clock divider is large, the sampling delay is large (time to fetch and decode the instruction)
  • if you have an external chip which has a very short "hold time" after a falling edge - you would sample too late!
There is not a solution, except: you have to write the sampling state machine in a way that it runs with the SYSCLK (e.g. 150 MHz).
Then the delay is shorter (but still ONE PIO clock cycle there).

My wish:
Have an "IN_(pins, 4)" instruction which can wait and be in sync with a clock edge, e.g. "INfalling(pins, 4)": It waits itself for a falling edge seen and samples at the same time when the external input clock signal transitions ("in sync").

Wait for a Clock Edge:
In order to wait really for a clock edge, you need always two instructions (see above).

Solution:
If the rest of your sampling code, e.g. using loops, can make sure that the external clock has changed already to the opposite state (e.g. high), then you do not need to waste one instruction just to make sure the clock is high, before you wait for the important falling edge.
You can just wait for the falling edge, the clock "must" be already high in between (so much time with other instructions expired that this "constraint" might be true). You can save one instruction just by "assumption" (constraint).

My wish:
Have a "WAIT(0, gpio, 15)" instruction which can wait automatically on a clock edge, e.g. "WAITfalling(gpio, 15)": it is just triggered when it has seen a clock edge (it would save one instruction).

Input signal Clock Synchronizer:
Based on datasheet it is not really clear what is considered as a PIO input signal.
The PIO input signals go via a Clock Synchronizer (see datasheet, page 926, "There is a 2-flipflop synchronizer on each GPIO input, which
protects PIO logic from metastabilities.")
But if such a "direct" GPIO used also as an input signal? Is it affected in the same way via the Clock Synchronizer? For me: it does not look like (it seems to be faster, internally).

My wish:
how to use an external clock signal which is delayed in the same way, with the same amount of internal delay, so that I can deal with external delays by delaying also the sampling clock?

Needing a MicroDelay:
You can use delays in PIO instructions, e.g. via "delay()" or also via '[N]". But these delays are in the granularity of the PIO Clock (PIO clock divider based), not bases on SYSCLK (e.g. 150 MHz for all the main clocks, before the PIO divider).
If your PIO runs slow, an additional delay can be huge, e.g. via "[1]". There is no way to add just a tiny bit of a delay (e.g. I want to measure the Hold Time after a clock edge for external signals generated by an external chip, or I want to fine tune the timing on sampling with clock edges, to deal with the Hold Time spec. of external chips).

My wish:
Have an instruction like "MICRODELAY(n)") which can delay a bit based on the SYSCLK (not PIO clock). Or even a delay chain with gates, where I can tap the signal between gates. Other MCUs have such a feature, e.g. for SDIO or QSPI, where a DLYB can be used in order to delay a tiny bit the time when an external signal is sampled.
It is often necessary in order to deal with "round trip delays", delayed responses from external chips (e.g. QD0...QD3 on QSPI are delayed in relation to the internal master clock). Adding a fine tuning delay would be great.

The PIO is really great and I love it. Just sometimes I am struggling to use just 32 instructions (max.) per PIO and it seems to me I have to find tricks to free instruction space just to do accomplish something (e.g. waiting for a clock edge).

Loading a 32bit value to X or Y register:
There is not any instruction which could load a full 32bit value to scratch registsers X or Y. Obvious (just 16bit instructions and no space to encode a 32bit value as part of instruction).
So, the "only" way is: PUSH() a value and load it from OSR into X or Y:

Code:

pull()mov(y, osr)
My wish:
Is there a way, even with EXEC, to force an instruction been executed (from the outside), so that I can load a 32bit value into a scratch register?
(here I need a number of words for a transaction, e.g. how many words to sample and read).
It would make my Read state machine free of any needs for a "push()" just for this purpose. Without this "just needed once push()" for transferring a 32bit value into Y - I could use "FIFO joining" (on my Read process getting the number of read cycles to do first).

An EXEC instruction, or an "external" access to scratch registers X and Y would be great.
If I could load X or Y via PIO config registers - it would be helpful.

Statistics: Posted by tjaekel — Wed Oct 16, 2024 6:47 am



Viewing all articles
Browse latest Browse all 4151

Trending Articles