Monday, July 16th 2018

QA Consultants Determines AMD's Most Stable Graphics Drivers in the Industry

As independent third-party experts in the software quality assurance and testing industry for over 20 years, QA Consultants has conducted over 5,000+ mission-critical projects and has extensive testing experience and depth in various industries. Based in Toronto, Ontario, QA Consultants is the largest on-shore software quality assurance company, with a 30,000 sq. ft., industry-grade facility called The Test Factory.
Commissioned by AMD, QA Consultants used Microsoft's Windows Hardware Lab Kit (HLK) to conduct a 12-day head-to-head test comparing AMD and Nvidia graphics drivers. The test compared 12 GPU's, six from AMD and six from NVIDIA, running on 12 identical machines. All machines were configured with Windows 10 April 2018 Update. Both gaming and professional GPUs were equally represented in the testing. After running for 12 days of 24-hour stress tests, the aggregate of AMD products passed 93% of the scheduled tests. The aggregate of NVIDIA products passed 82% of the scheduled tests. Based on the results of testing the 12 GPUs, QA Consultants concludes that AMD has the most stable graphics driver in the industry.
About QA Consultants

QA Consultants is the North American leader in software quality assurance and testing services. Having successfully delivered over 5,000 testing and consulting projects to a variety of sectors including automotive and transportation, advertising and marketing, banking and finance, construction, media and entertainment, US & Canadian Federal State and Local government, healthcare, insurance, retail, hospitality, and telecommunications.

The Test Factory is the next generation of software testing, providing a superior quality, cost and service alternative to offshore providers and external contractors. The centre can handle any testing project of any size, with any application and for any industry. With full-time employees in Toronto, Ottawa and Dallas, QA Consultants supports customers by providing testing services such as accessibility testing, agile testing, test automation, data testing, functional testing, integration testing, mobility testing, performance testing, and security testing. Along with engagement models like Managed Consulting Services and On Demand Testing , QA Consultants is equipped to handle any client's request.
Sources: QA Consultants, Graphics Driver Stability Report
Add your own comment

124 Comments on QA Consultants Determines AMD's Most Stable Graphics Drivers in the Industry

#101
Xzibit
efikkanIt's the standard for getting WHQL certification, both AMD and Nvidia have their own validation suite.

It doesn't matter if it's the standard or not when the test is not conducted correctly. When you have multiple cards running the same driver, but only one have major problems time after time, then there is something wrong with the card. If AMD or Nvidia saw similar results in their validation testing, they would replace the faulty cards.
So how is the test suppose to be conducted correctly ?
Posted on Reply
#102
FordGT90Concept
"I go fast!1!11!1!"
If the cards were faulty, there would be a lot more hangs and failures.
Posted on Reply
#103
bug
XzibitSo how is the test suppose to be conducted correctly ?
QA CONSULTANTS DETERMINES MOST STABLE GRAPHICS DRIVER IN THE INDUSTRY
-> QA Consultants Determines the most stable among four windows drivers.

If you really want to see who's got the better driver, you need to look at way more cards than 12 GPUs. These drivers support probably more than 100 cards each. You also need to (obviously) look at more than one driver version.

What they did is like me asking an American and an European how many hours they work a week and based on that conclude that either Americans or Europeans work more.

And once more, they didn't say what failed in each case. That is really, really important.
Posted on Reply
#104
efikkan
XzibitSo how is the test suppose to be conducted correctly ?
You would know if you followed the past two pages of discussion.

The first major problem is the sample size. All it takes in this test is a single card to be bad to tip the conclusion either way. Increase the sample size, and variations between individual cards, and all operator related errors will even out. They tested a sample size of 1 per bin, which is ridiculous, 10 samples should be the minimum for any "semi-sientific" test. When they are testing cards of the same generation, but only one of them consistently have stability issues, this clearly points to problems with the sample. No competent person would base a conclusion of driver stability based on obvious hardware issues.

Secondly they didn't provide details about why things failed. Seriously, most of the 125 page report is useless information which could have been summarized, yet they didn't include information about why things failed.

Thirdly, the WHQL certification tests should at best be a part of a larger test suite, especially since the testing of heavy graphics load and rendering artifacts are too limited, which are very relevant for a stability test.
Posted on Reply
#105
FordGT90Concept
"I go fast!1!11!1!"
bugAnd once more, they didn't say what failed in each case. That is really, really important.
"Failures represent crashes caught by Windows HLK and system hangs."

Hang = Controller didn't get a response from the machine--probably a BSOD.
Fail = Client caught a failure and submitted it to the controller.

The entire test was automated which is why there isn't much in the way of details.
Posted on Reply
#106
efikkan
FordGT90ConceptThe entire test was automated which is why there isn't much in the way of details.
Which is why someone invented the concept of logging. :rolleyes:
Posted on Reply
#107
Xzibit
bug-> QA Consultants Determines the most stable among four windows drivers.

If you really want to see who's got the better driver, you need to look at way more cards than 12 GPUs. These drivers support probably more than 100 cards each. You also need to (obviously) look at more than one driver version.

What they did is like me asking an American and an European how many hours they work a week and based on that conclude that either Americans or Europeans work more.

And once more, they didn't say what failed in each case. That is really, really important.
Again your complaining about what you wanted the test to do.

The paper explains the parameters of the test. This has proven to be way too difficult for people to understand.
efikkanYou would know if you followed the past two pages of discussion.

The first major problem is the sample size. All it takes in this test is a single card to be bad to tip the conclusion either way. Increase the sample size, and variations between individual cards, and all operator related errors will even out. They tested a sample size of 1 per bin, which is ridiculous, 10 samples should be the minimum for any "semi-sientific" test. When they are testing cards of the same generation, but only one of them consistently have stability issues, this clearly points to problems with the sample. No competent person would base a conclusion of driver stability based on obvious hardware issues.

Secondly they didn't provide details about why things failed. Seriously, most of the 125 page report is useless information which could have been summarized, yet they didn't include information about why things failed.

Thirdly, the WHQL certification tests should at best be a part of a larger test suite, especially since the testing of heavy graphics load and rendering artifacts are too limited, which are very relevant for a stability test.
Same as Bug up their

A bunch of I woulds and shoulds.

Failures are explained in the paper which certain people seam to have a problem understanding.
QA ConsultantsFailures represent crashes caught by Windows HLK and system hangs.
WHQL submissions aren't shared with the public either yet you want this test logs to be. I'm take a wild guess because you disagree with the conclusion or you want to kindly debug.
Posted on Reply
#108
bug
FordGT90Concept"Failures represent crashes caught by Windows HLK and system hangs."

Hang = Controller didn't get a response from the machine--probably a BSOD.
Fail = Client caught a failure and submitted it to the controller.

The entire test was automated which is why there isn't much in the way of details.
Seriously, is that explanation enough for you?
XzibitAgain your complaining about what you wanted the test to do.

The paper explains the parameters of the test. This has proven to be way too difficult for people to understand.
Your question was how would a satisfactory test be conducted, so I have answered precisely that.

At this point, I believe a poll would be in order.
Question: Why do we need 5 pages to explain cherry-picking?
Answer 1: Because people on TPU are that dumb.
Answer 2: Because when given a chance to throw dirt at something, arguments take the back seat.
Posted on Reply
#109
Xzibit
bugYour question was how would a satisfactory test be conducted, so I have answered precisely that.
No, efikkan implied the test were done incorrectly
efikkanIt doesn't matter if it's the standard or not when the test is not conducted correctly.
As if it was a fault in the testing. Turns out just like the last few pages the only issue is sample size.
Posted on Reply
#110
FordGT90Concept
"I go fast!1!11!1!"
bugSeriously, is that explanation enough for you?
QA Consultants was paid to test driver stability in a clear manner, not answer the question why or how this driver is more stable on that card.

AMD/NVIDIA need to take these results, attach a debugger, and hammer the test on problematical hardware to find out and fix what caused it. The specifics are above QA Consultants' pay grade (namely because it requires access to closed source code).
Posted on Reply
#111
bug
FordGT90ConceptQA Consultants was paid to test driver stability in a clear manner, not answer the question why or how this driver is more stable on that card.

AMD/NVIDIA need to take these results, attach a debugger, and hammer the test on problematical hardware to find out and fix what caused it. The specifics are above QA Consultants' pay grade (namely because it requires access to closed source code).
And again, is that enough for you? Can you draw a(ny) conclusion from that?
Posted on Reply
#112
efikkan
FordGT90ConceptQA Consultants was paid to test driver stability in a clear manner, not answer the question why or how this driver is more stable on that card.


AMD/NVIDIA need to take these results, attach a debugger, and hammer the test on problematical hardware to find out and fix what caused it. The specifics are above QA Consultants' pay grade (namely because it requires access to closed source code).
When one card have reproducible instability but other hardware of the same architecture suffer no such problems, and the driver is the same, how can anyone conclude it's the driver? If the driver was to blame, the symptoms would be similar across the board. No logical reasoning would conclude the driver quality is to blame. :facepalm:
Posted on Reply
#113
bug
XzibitNo, efikkan implied the test were done incorrectly
Yes, when you don't spill the beans on what you did, how, and what the results were, that's what people will believe.
They took one canned test, ran it on very few cards using just one driver version and gave us the summary. If you were paid to determine the stability of a drive, is that how you would do it?
Posted on Reply
#114
Xzibit
bugYes, when you don't spill the beans on what you did, how, and what the results were, that's what people will believe.
They took one canned test, ran it on very few cards using just one driver version and gave us the summary. If you were paid to determine the stability of a drive, is that how you would do it?
I would have to defer to a company who's a trusted testing company for businesses, government departments and institutions an award-winning provider of software testing and quality assurance solutions over the last 20 years then my time in the forums.
Posted on Reply
#115
mtcn77
efikkanWhen one card have reproducible instability but other hardware of the same architecture suffer no such problems, and the driver is the same, how can anyone conclude it's the driver? If the driver was to blame, the symptoms would be similar across the board. No logical reasoning would conclude the driver quality is to blame. :facepalm:
If I understand it correctly, you are wrong. It takes <4 tries to baseline a card that is faulty at a hardware level. This is the third I have done so, the vrm controller reboot cycle cannot repudiate mtbf. Mind you, each test run period is 4 hours - I can do it in less than 15 minutes.
Posted on Reply
#116
FordGT90Concept
"I go fast!1!11!1!"
bugAnd again, is that enough for you? Can you draw a(ny) conclusion from that?
Yup and yup.
efikkanWhen one card have reproducible instability but other hardware of the same architecture suffer no such problems, and the driver is the same, how can anyone conclude it's the driver? If the driver was to blame, the symptoms would be similar across the board. No logical reasoning would conclude the driver quality is to blame. :facepalm:
www.amd.com/system/files/documents/graphics-driver-quality.pdf Page 110
RX 560: 1
RX 580: 1
Vega 64: 2
WX 3100: 13
WX 7100: 5
WX 9100: 9
Total: 31

GTX 1050: 3
GTX 1060: 10
GTX 1080 Ti: 2
P600: 31
P4000: 12
P5000: 18
Total: 76

RX 560/RX 580 errored 4 hours out of 288 or 1%
P600 errored 124 hours out of 288 or 43%

Every card had at least two full days of no errors and every card had at least one error. The only card that may be suspect is the P600 but all the rest fall in line with a clear pattern of professional drivers/cards getting less driver love than gaming cards. Even if you completely omit P600, NVIDIA still had far more errors than AMD did even considering the 3 to 2 advantage NVIDIA; hence, their conclusion didn't mince words: "...AMD has the most stable graphics driver in the industry." That statement is absolutely true given the parameters of the test.
Posted on Reply
#117
bug
XzibitI would have to defer to a company who's a trusted testing company for businesses, government departments and institutions an award-winning provider of software testing and quality assurance solutions over the last 20 years then my time in the forums.
I wouldn't. I looked it up, it's a no-name. Several former employees describe it a a QA sweat-shop that regularly lies to their customers.
Now, I know better than to judge a company based on disgruntled (former) employees testimonies. But that's all a Google search brings up.

Now, I'm outta here. You and Ford are either being stubborn or plain dumb. And I don't care to fix either.
Posted on Reply
#118
Xzibit
bugI wouldn't. I looked it up, it's a no-name. Several former employees describe it a a QA sweat-shop that regularly lies to their customers.
Now, I know better than to judge a company based on disgruntled (former) employees testimonies. But that's all a Google search brings up.

Now, I'm outta here. You and Ford are either being stubborn or plain dumb. And I don't care to fix either.
We aren't the ones saying it should have been this way. Which in it of itself is fine but then you want everyone to see it your way. That's the strange thing. You have no acceptance of the matterial itself nor of others opinion if they don't conform to yours.

I like how you bring it up that your not going to judge them by it but that's what you just did.
K, but can you leave the ball if you go.
Posted on Reply
#119
londiste
FordGT90ConceptDrivers should never hang. It's just a different kind of failure.
You are right. But in this case the subsequent tests were considered failed when they really did not run (due to hanged driver). Hangs earlier in the day happened a lot more on Nvidia cards than AMD ones. And this is not a comment on driver quality just the facts from the paper.

For example, look at this table:

With this methodology it matters when driver hangs. Much worse when it hangs in the beginning of the day (like GTX1060 does in this image). Better, if it happens later (like WX3100 or P5000 did that day).
Going over all the tables, there does not seem to be a pattern for the hangs, just pure luck.

Judging from the result tables:
AMD: 4 hangs in the middle of the day, resulting in 10 tests not run + 5 fails at the end of the day (that were potentially hangs)
Nvidia: 11 hangs in the middle of the day, resulting in 40 tests not run + 6 fails at the end of the day (that were potentially hangs)
Posted on Reply
#120
FordGT90Concept
"I go fast!1!11!1!"
That's still a problem with the driver and the driver communicating with Windows. Windows is always looking for the display driver to hang and if it does, reboots the driver. In all of these cases of hanging, the driver failed to report it was having issues to Windows so Windows can't reset it. It not only had a problem that caused the driver to quit responding, it also failed to let Windows know. The yellow boxes are worse than the red boxes.
Posted on Reply
#121
londiste
Yellow boxes do not mean a damn thing. The red box right before these does. The paper does not differentiate between crashes and hangs at all.
The amount of yellow boxes is what irks me about their methodology.

My experience so far (on Windows 7 and 10) has been that automatic driver restart is far from reliable. Also Nvidia drivers seem to be far more responsive to manual forced driver restart than AMD drivers.

Edit:
I mean, theoretically they could have also scheduled the entire 12 day run in one go and see how far each card goes. This would have eliminated the following cards a long time before getting to the end. For curiousity's sake, I went over the result again and specifically, these cards would gave gone out after the number of successfuly test runs:
P5000 - 2
P600 - 6
WX 9100 - 9
GTX 1060 - 17
WX 3100 - 19
P4000 - 36
:laugh:
Posted on Reply
#122
FordGT90Concept
"I go fast!1!11!1!"
HLK controller/client only has three results: pass, fail, or unresponsive.


You got it backwards: the days are top to bottom, not bottom to top. Hang -> hang -> hang -> fail. The first hang should have been a fail and there should have been no subsequent fails after that but no, the driver continued to report that all was well when it wasn't:
Posted on Reply
#123
londiste
Strange, I got different impression from reading the paper.
Thanks for correcting me.
Posted on Reply
#124
RichF
bugI will give them the benefit of the doubt. But the engineer in me just doesn't take things at face value ;)
bugI will add to that that my 1060 has been rock solid. Lab tests meet real life ;)
And the statistician? One 1060, particularly one that hasn't followed the article's testing regime, is the definition of anecdotal evidence.
bugNot enough for what? If the goal is to smear the competition, a paid for test will do just fine (remember, nobody had a problem with GPP till AMD "brought" it to Kyle's attention). If the goal is to learn something from it, it does fall short, as you have noted.
The alleged conspiracy by AMD to "smear" the competition doesn't seem to have left them looking particularly great. Pro line units are supposed to be more, not less, stable.
Posted on Reply
Add your own comment
Apr 25th, 2024 23:54 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts