ECC memory problems - Dell Precision T5610

I’ve had this Dell workstation for a couple of years and have never gotten it fully stable. This is my first system with ECC memory. It can run for a couple weeks perfectly fine, or only a couple days at a time before it hard crashes to instant reboot then goes to a memory error screen.

It has a pair of Intel Xeon E5-2670v2 10-core CPU’s and 8 x DDR3 1866 MHz RDIMM ECC memory slots.

I bought a bunch of compatible dimms on eBay, like twice as many as I have slots for, so I could swap any bad ones out as they were found. However, I’ve never been able to get the system stable.

One problem is that the ECC error read out in the BIOS setup section, shows 4 slots as empty (they are not). Thinking maybe some kind of config string is not being read from the DIMMs, I swapped the DIMMs around, but the last 4 slots always show as empty in the error screen. It’s also weird that it shows them as slots 9-12 instead of DIMM5-8_CPU2 like the ones at the top.

However, the screen in setup that shows the hardware present, that does show all the memory sticks as present and usable. When booted into Windows, it does see the full 32GB.

In the error screen, when I saw any error count > 0, I would swap that DIMM out, which is why those all show 0 now. If there are errors on the bottom 4, I can’t really tell, so I swapped those out also but the ECC screen did not change on the bottom 4.

I’m kind of at a loss on what is going on or what to try next.

The problems I’m having:

  1. Why doesn’t the ECC screen ever recognize or show errors for the bottom 4 dimm slots?

  2. Sometimes if I swap in differ DIMMs, the system will boot, but go into this state where is only seems to process for 3-4 seconds, then takes no input for 1 minute. This is even to get to the BIOS setup and in the setup. No keyboard inputs register except during that few second window, so I have to keep pressing an arrow key trying to hit the window of time. It doesn’t buffer up keystrokes during the dead period either. I’ve managed through arduous process to get into the ECC screen and it doesn’t show anything different. I’ve managed to get to Windows boot screen also and the spinning circle only turns during that few seconds. Does anybody know what is going on here? This problem only started happening recently.

I know this is all pretty confusing, but I’d appreciate any tips or pointers on what to try next.

What happens if you drop down to first four slots only? Or even just one dimm?

Does the error distinguish single bit errors vs others?

In general your sources of problems could be…
A dimm
A slot
A CPU
The motherboard.