For 7 years, I've been trying to solve this mystery. I've tried many things, and now completely ran out of ideas.
I bought two identical PCs: i7-3820, 64Gb DDR3 Corsair Vengeance GMZ64GX3M8A1600C9 , Gigabyte GA-X79-UP4 . And both systems are unstable - segmentation faults and memory corruption. Everything points to bad memory - but I can't believe both kits are bad, so I also suspect everything else. I could have returned the kits 4 years ago, if by that time I was sure that I should return the RAM (both of them) - and not both CPUs, or both mainboards, or maybe I'm just doing something wrong... Segmentation faults can happen for a lot of reasons.
Is there anything I can do to make it stable? By now, I would be fine with lower speeds/timings, or missing one or two sticks in each kit. The problem is, NONE of that helps: lower speeds/timings do not help even a little bit, and removing one or two sticks help a little. With only half the kit, it's *almost* stable, but not still stable enough for production (medium-load ERP servers). Some sticks show better "compatibility" than others. I think we can treat this as a "mixed kits" case for the purposes of system tuning.
- - -
Now, the details and the journey.
The test I'm using is building the Linux kernel with 8 threads, on a "ramdisk". That's easy to script, it emulates the real workload, and it seems to catch errors much faster than OCCT. Each build takes 15G "disk" space and some RAM usage, so building it three times tests over 45 Gb - good for testing 6-8 sticks. I have "Channel Interleave" and "Rank Interleave" enabled, so that should be a pretty good test. I test it on one of the two PCs, the other showing exactly the same behavior.
At first, I thought the kits might have been mixed, but then I've found numbered warranty labels on each stick, and their numbering makes sense. All sticks are the same - batch (apparently) 130502368 , serial 236xxx , so I'll refer to them by the last three digits of the serial.
Here are the kits - warranty stickers in the first column, serials in the second column:
Kit 1:
27 351
28 346
29 305
30 352
31 353
32 291
33 508
34 354
Kit 2:
35 262
36 506
37 507
38 226
39 289
40 227
41 225
42 288
I can't entirely dismiss the possibility that the shop employee was a complete idiot and went out of his way to mix up the kits before putting the stickers on them, but if we look at the first kit in sequence of the stickers, we see: 351-...-...-352---353-...-...-354 . Whatever pattern they were using, that seems to be really one kit. And the numbers on the second kit also follow this pattern: ...-506-507-...---289-...-...-288 . Also, 225, 226 and 227 belong to one kit. Based on that, I think those are not mixed.
Of course, I've tried them in all kinds of other combinations, which did not help. If I put two "better" half-kits together (351-352-353-354-506-507-288-289), then it lasts for a few builds (2-3 hours), but the sticks left are completely unusable as a kit. Both kits fail under 5 minutes in seemingly any stick order.
I once had an opportunity to test it with a known good memory kit (another 64Gb Corsair Vengeance) and a known good system (i7-4930K, GA-X79S-UP5). The faulty system worked with known good memory! (Though I do not remember if I tested it long enough and well enough...) And faulty memory was "much less faulty" in the "good system", but that was before I noticed the warranty stickers, so I could not test the kits as intended - and it was still not good enough.
I have once managed to get it to run seemingly stable with 7 sticks, but then decided to continue the pursuit of perfection, and did not write that combination down. %)
After that, remembering that i7-4930 helped somewhat in the past, I've tried this system with an i7-4820. It is said to have a better IMC - maybe this was the reason?.. But it did not make any difference at all.
I've tried a better power supply. That did not make any difference at all. I've tried another one (600W this time) - again, no difference. Or maybe a little bit - I'm still testing this. But surely 500W is enough? 150W for the CPU, 100W for the mainboard+memory, 30W for a GeForce GT610, and I'm testing with only one SSD attached.
I've tried raising voltages and lowering speeds and timings. No difference at all. But I've never done any overclocking other than simply raising the "turbo limit" and CPU core voltage, so maybe there is something else I can try?..
The results I have right now are roughly this: with each kit, 8 sticks fail under 5 minutes, 7 sticks fail under an hour, 6 sticks fail under two or three hours, and 4 sticks are not enough to run the intended application (and if I remember correctly, is not completely stable either).
I bought two identical PCs: i7-3820, 64Gb DDR3 Corsair Vengeance GMZ64GX3M8A1600C9 , Gigabyte GA-X79-UP4 . And both systems are unstable - segmentation faults and memory corruption. Everything points to bad memory - but I can't believe both kits are bad, so I also suspect everything else. I could have returned the kits 4 years ago, if by that time I was sure that I should return the RAM (both of them) - and not both CPUs, or both mainboards, or maybe I'm just doing something wrong... Segmentation faults can happen for a lot of reasons.
Is there anything I can do to make it stable? By now, I would be fine with lower speeds/timings, or missing one or two sticks in each kit. The problem is, NONE of that helps: lower speeds/timings do not help even a little bit, and removing one or two sticks help a little. With only half the kit, it's *almost* stable, but not still stable enough for production (medium-load ERP servers). Some sticks show better "compatibility" than others. I think we can treat this as a "mixed kits" case for the purposes of system tuning.
- - -
Now, the details and the journey.
The test I'm using is building the Linux kernel with 8 threads, on a "ramdisk". That's easy to script, it emulates the real workload, and it seems to catch errors much faster than OCCT. Each build takes 15G "disk" space and some RAM usage, so building it three times tests over 45 Gb - good for testing 6-8 sticks. I have "Channel Interleave" and "Rank Interleave" enabled, so that should be a pretty good test. I test it on one of the two PCs, the other showing exactly the same behavior.
At first, I thought the kits might have been mixed, but then I've found numbered warranty labels on each stick, and their numbering makes sense. All sticks are the same - batch (apparently) 130502368 , serial 236xxx , so I'll refer to them by the last three digits of the serial.
Here are the kits - warranty stickers in the first column, serials in the second column:
Kit 1:
27 351
28 346
29 305
30 352
31 353
32 291
33 508
34 354
Kit 2:
35 262
36 506
37 507
38 226
39 289
40 227
41 225
42 288
I can't entirely dismiss the possibility that the shop employee was a complete idiot and went out of his way to mix up the kits before putting the stickers on them, but if we look at the first kit in sequence of the stickers, we see: 351-...-...-352---353-...-...-354 . Whatever pattern they were using, that seems to be really one kit. And the numbers on the second kit also follow this pattern: ...-506-507-...---289-...-...-288 . Also, 225, 226 and 227 belong to one kit. Based on that, I think those are not mixed.
Of course, I've tried them in all kinds of other combinations, which did not help. If I put two "better" half-kits together (351-352-353-354-506-507-288-289), then it lasts for a few builds (2-3 hours), but the sticks left are completely unusable as a kit. Both kits fail under 5 minutes in seemingly any stick order.
I once had an opportunity to test it with a known good memory kit (another 64Gb Corsair Vengeance) and a known good system (i7-4930K, GA-X79S-UP5). The faulty system worked with known good memory! (Though I do not remember if I tested it long enough and well enough...) And faulty memory was "much less faulty" in the "good system", but that was before I noticed the warranty stickers, so I could not test the kits as intended - and it was still not good enough.
I have once managed to get it to run seemingly stable with 7 sticks, but then decided to continue the pursuit of perfection, and did not write that combination down. %)
After that, remembering that i7-4930 helped somewhat in the past, I've tried this system with an i7-4820. It is said to have a better IMC - maybe this was the reason?.. But it did not make any difference at all.
I've tried a better power supply. That did not make any difference at all. I've tried another one (600W this time) - again, no difference. Or maybe a little bit - I'm still testing this. But surely 500W is enough? 150W for the CPU, 100W for the mainboard+memory, 30W for a GeForce GT610, and I'm testing with only one SSD attached.
I've tried raising voltages and lowering speeds and timings. No difference at all. But I've never done any overclocking other than simply raising the "turbo limit" and CPU core voltage, so maybe there is something else I can try?..
The results I have right now are roughly this: with each kit, 8 sticks fail under 5 minutes, 7 sticks fail under an hour, 6 sticks fail under two or three hours, and 4 sticks are not enough to run the intended application (and if I remember correctly, is not completely stable either).