Random stack traces bringing down entire network (Anniversary NSFW Build with Unraid)

Hi all! Having some weird issues with my build. Everything has been working fine for quite a while until recently I’ve started having random kernel panics: Example 1, Example 2

To make things even weirder - this also kills my network. Seriously. If this stack trace is up, none of my ethernet devices work and my network switch is a giant turd until I reboot the server.

Here are the server stats:

I used JDM’s O.G. Anniversary NSFW build build with the following stats:

GA-7PESH2 GIGABYTE Intel Rev1.0
32GB DDR3 1600MHz ECC
2x INTEL XEON E5-2660
HP 24 Bay 3GB SAS Expander Card

and a hodgepodge of hard drives:

1 x 8TB SAS (parity)
3 x 4TB SAS
4 x 2TB SAS
1 x 6TB SAS
256 gb SSD (cache)

Using Unraid Plus 6.8.3

The server is hard wired to a network switch, which then sends Ethernet throughout my house (and to some Eeros)

I can’t really reproduce it since it happens randomly, but here is stuff I’ve tried when I can:

  • Ran Memtest (which passed) and tried running the server with 2 of my 8GB chips only, and then swapping out with the other two when it panicked again to see if it was a certain chip.

  • Messed with a whole slew of network settings in Docker. Here is my current setup if that might help:

  • Turned on syslogs which… doesn’t show anything until the computer restarts :man_facepalming:

I appreciate you making it this far! I offer my dog with his head stuck in a tree as payment:

1 Like

Is your IPMI plugged in and assigned an unused address?

It’s actually not plugged in at all. I think I unplugged it during a troubleshooting session and never re-plugged it in. Should it be?

Anyone have any ideas? I can deliver more goods!

1 Like

You should plug it in, and assign it an unused IP in your BIOS and in your router.

Gotcha - is that to get more information when it crashes again? Or would that prevent a network lockup?

Your IPMI interface is likely trying to use the same IP as your router.

My god… thats genius! Thank you! I’ll try that!

1 Like

So JDM’s comment above worked for a while, but the server still seems to be doing it.

I assigned the IPMI mac address a static IP, then I blocked it from the network, then (I think) I disabled it from the BIOS… still occasionally does it. :confused:

@JDM_WAAAT / Anyone - any other suggestions?

check bios for something like IPMI failover. I think there is a setting that can have the IPMI interface failover to your normal LAN port if disconnected. Mine used to do that and even after the IPMI port recovered it would not un-fail over until reset.

Ah-ha! That would make sense.

I can’t find anything in the motherboard manual though - do they refer to it as BMC here? There’s a few BMC settings but nothing that look like they’d do what you mention.

Anyone dealt with this with the GA-7PESH2 rev 1? Can’t seem to find the right toggle…

(puppy tax)

Is your local subnet 192.168.1.1?

@JDM_WAAAT Nope, 10.0.0.1

It’s most likely a packet storm of some sort, which the panicked machine is not necessarily the cause of (but it still could be). It could be a *cast storm caused by a faulty/misconfigured device on the network. On this machine, do you have any non-standard network config (i.e. bridged)?

Monitoring the traffic on the switch with another machine using tcpdump/wireshark would be your best bet for diagnosis.

1 Like

@Riggi Ah! Interesting… I have an eero setup, so the “main” eero goes to a 24 port switch which connects everything else in the house (including the server).

I do notice whenever I get the network issue, the eero status screen shows devices that are normally hard wired with ethernet as “using wireless”. Restarting the Unraid server does fix it every time, though… that makes me think it’s still the culprit.

The packet storm theory sounds like a good lead! Any thoughts on where to start investigating that?

Checking in again on this - I managed to turn off everything IMPI related that I could find in the BIOS but still getting the same issue. I’m starting to wonder if it’s a hardware issue? Could the motherboard malfunctioning spew a bunch of garbage through the Ethernet jacks?

Sorry for the potential idiocy - I’m a programmer, my hardware skills are very limited :sweat_smile:

You don’t have to turn IPMI off, did you se the address to one that’s not on your primary subnet?

@JDM_WAAAT Hmm- I believe it was set to just DHCP before. Should I set it to something not in 10.0.0.0/24 ? Or make sure to give it an IP that is there?

I actually had it mapped by MAC address through my router to a static IP that nothing else was using as well.