I seem to recall a VMware complaint similar to this too, and there was a ring buffer tuning to do to fix it… But that error message doesn’t seem quite right to match it.
TX queue timeouts can be caused by several things, but I wonder if you’re not seeing an end result of a spammy Ethernet flow control implementation where the device can’t transmit because the link is continuously paused.
If so, there may be rx_xoff counters viewable from within proxmox, or “ethtool -s enp1s0f0” would tell you where the device is seeing pause frames from the switch on a regular Linux host.
The link down tends to be a reaction by the driver to recover from a hung queue, so if it’s not flow control, there could be a driver/firmware upgrade possible, or a series of tunables if there’s a bug somewhere in packet handling land resulting in the NIC itself hanging.
I seem to recall a VMware complaint similar to this too, and there was a ring buffer tuning to do to fix it… But that error message doesn’t seem quite right to match it.
TX queue timeouts can be caused by several things, but I wonder if you’re not seeing an end result of a spammy Ethernet flow control implementation where the device can’t transmit because the link is continuously paused.
If so, there may be rx_xoff counters viewable from within proxmox, or “ethtool -s enp1s0f0” would tell you where the device is seeing pause frames from the switch on a regular Linux host.
The link down tends to be a reaction by the driver to recover from a hung queue, so if it’s not flow control, there could be a driver/firmware upgrade possible, or a series of tunables if there’s a bug somewhere in packet handling land resulting in the NIC itself hanging.