My impression is a 32-bit data bus along with contention for open pages plays a significant role in Raspberry Pi performance. In the end, however, the specification of each individual part doesn't matter as much as how the system behaves as a whole. From a certain point of view practical testing should be performed as close to the application environment as possible.That's a big part of why--back about 40 years ago--I used to pull the pseudo-assembly listing when compiling COBOL programs. I could check that the compiled code was doing what I thought it was.You're right, I am probably also benchmarking append operation. And that is my point : I have no clue of what these line codes mean on a HW point of view... but I know that programmers write in C,C++, Python, etc.... and very rarely in assembly language. So, there is certainly a "simple" answer to my issue.
On the primary topic...you might start with looking up (a) the specs on LPDDR4X (which is what the Pi5 uses), and (b) anything you can find on memory bus clock for the Pi5. If you can't find that data, or you don't understand and/or trust it, then write your memory speed tests as close to bare metal as you can to eliminate OS, interpreter, and compilation overhead.
Application-level testing indicates the Pi 5 can achieve about 10 GB/sec or equivalently 0.09 ns/byte in Linux.
viewtopic.php?p=2159018#p2159018
I think there have been firmware updates that change memory timings related to refresh that may also affect the observed memory bandwidth. Interestingly, a recent NUMA-emulation patch to the way Linux allocates memory on the Pi 5 has also led to performance improvements.
https://lore.kernel.org/lkml/2024062512 ... galia.com/
I'm not sure the status is of this last optimization.
At any rate, since 0.09 is much less than 330, there should be plenty of room for improvement in the speed of your current loop.
Statistics: Posted by ejolson — Sat Jul 27, 2024 9:03 pm