I'm planning to make my code open source, but at the moment it's wrapped up in a proprietary application, built outside the SDK and with an obsolete version of LWIP, so needs a bit of work before I can release it.yes you're right. Thanks for the summary.
What max. data rates rx/tx can be achieved with the proposed solution and is there an example available?
Performance is tricky to put a number on; in my application I'm using LWIP for the TCP/IP layer and that is almost certainly the limiting factor if I tried, for example, to get maximum TCP throughput out of it. But for an end-application it's not typically the network's maximum throughput that's critical - more important is how much of the machine is left over to run the application.
In terms of resource usage, my implementation uses 3 PIO SMs (25 instructions, so 7 left for the 4th SM to do something else), 3 DMA channels and 3 interrupts. About 6K of RAM is used for the MAC layer (plus whatever you allocate to LWIP for protocol buffers). 10 GPIO pins are needed for the RMII interface to the PHY chip.
To estimate the CPU/memory overhead, in my implementation:
- There's no overhead at all if the network is idle (nothing is polled, no DMA running unless there's packets actually in transit).
- For packets that you are deliberately sending or receiving, they need to pass twice through DMA - once with 32-bit wide transfers as they come to or from the Ethernet wire, then again with 8-bit wide transfers to do the CRC calculation and scatter/gather into the LWIP protocol buffers. So if you were trying to transmit and receive simultaneously at the full 100Mbit/sec, there's 5 DMA transfers for every 4 bytes, 12.Mbytes/sec in each direction, two directions so 25Mbytes/sec, 31M DMA transfers/sec. So with 100MHz clk_sys, the DMA would be busy about 1/3 of the time and using about 1/6 of the total memory/bus bandwidth (given that it will be doing a read and a write in each cycle, but the main RAM is striped 4 ways so only two of those ways used).
- In terms of CPU usage, it has to take an interrupt per-packet, match the MAC address to decide whether or not to accept the packet, and if so program the second DMA to move it into protocol buffers. This overhead is obviously worse for small packets, but the smallest possible packet size on Ethernet is 64 bytes = 512 bits, so the worst-case is one interrupt every 5.2us. The address match is only a few instructions unless you are using a large multicast address table (and if there's a lot of multicast going on then it's usually big packets). On packets you are actually accepting, there's further interrupts for scatter/gather into protocol buffers; LWIP is normally configured with rather small protocol buffers, so the net effect is to eat up the advantage of the large packets - but it still works and you could change the buffer sizes if that aspect of performance is critical.
In my application, what I care about is the ability to throw away unwanted traffic: I use it in a heavy multicast environment, so it may be getting hosed with 100Mbit/sec of junk that I don't want, and needs to extract the relatively small number of packet that I actually want to receive without dropping any - and it does that satisfactorily. If you used it as, say, a webserver then the overhead from the MAC layer would be minimal and the performance would be limited by the rate at which you can process the packets (LWIP's TCP and then the application itself). It should perform better than similar applications on a PicoW over WiFi. If you wanted to use it as some kind of pass-through packet filter/firewall device, then the MAC layer could do the full 100Mbit/sec but the RP2040 doesn't really give you many CPU cycles per byte to do anything with that torrent of data passing through.
Statistics: Posted by arg001 — Sat Aug 10, 2024 9:52 pm