1. Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Intel 1000/PRO GT (e1000 driver) and "Detect Tx Unit Hang" error with 4GB RAM

Discussion in 'Linux / BSD / Mac OS X' started by SAlexson, Nov 7, 2007.

  1. SAlexson New Member

    Joined:
    Jun 12, 2007
    Messages:
    39 (0.01/day)
    Thanks Received:
    0
    Location:
    Connecticut
    My system configuration:
    ASUS M2A-VM motherboard
    AMD Athlon 64 X2 4200+ 2.2 GHz
    4x A-DATA 1GB DDR2 800 memory
    2x Intel 10/100/1000 Pro/1000 GT Desktop Network Adapter
    2x Seagate Barracuda 250GB HD (RAID 1, software RAID)
    CentOS5 x86_64; Kernel 2.6.23 (custom built); Version 7.6.9.2 e1000 driver

    The symptoms of this problem are outlined at:

    http://e1000.sourceforge.net/wiki/index.php/Issues
    http://e1000.sourceforge.net/wiki/index.php/Tx_Unit_Hang

    Last night I started experiencing the "Detected Tx Unit Hang" problem with the Intel e1000 NIC. This happened after I upgraded my system to 4GB RAM (previously 2GB). I have 2 of these cards in the system. I updated the Linux kernel to 2.6.23 and I downloaded from Sourceforge and installed the most recent stable version of the e1000 driver for Linux, version 7.6.9.2. I still experiencing the "Detected Tx Unit Hang" message. I had to recompile the kernel because upgrading to 4GB with the current kernel for CentOS 5 (2.6.18.8-1) causes an error, ata1: softreset failed (1st FIS failed), which results in a kernel panic. Upgrading the kernel to 2.6.23 fixed that problem, but now I have a problem with my network cards.

    Searching around, I found posts saying that disabling acpi with the kernel options "acpi=off noacpi" would fix it, but it did not. I tried added explicit modprobe options for the driver in /etc/modprobe.conf (options e1000 XsumRX=0 Speed=1000 Duplex=2 InterruptThrottleRate=0 FlowControl=3 RxDescriptors=4096 TxDescriptors=4096 RxIntDelay=0 TxIntDelay=0). Still no change. Still getting experiencing the problem.

    I then tried another suggestion I found in a forum discussion `ethtool -K eth0 tso=off`. Seems to have had no effect on the problem.

    This problem occurs immediately when the system is trying to bring the device up. I cannot even get to a point to try sending traffic over the network interface because it never negotiates an IP address from DHCP. If I specify a static IP address, the address is assigned, but I still experience the problem, and I cannot even ping another host.

    Now, if I reduce the amount of RAM to 3GB or less, everything works fine! So, this leads me to believe that my kernel and driver are configured, compiled, and functioning correctly. It also leads me to believe that there are no problems with the network cards. So, I though perhaps a bad memory module, but no matter which 3 modules of the 4 I leave in, I get the same results. Everything works fine until I add the 4th module.

    Then I found an article on Intel's site saying that some older EEPROM have the power management option turned on, and that could cause the problem. So, I downloaded the script that would fix the bit in the EEPROM (turning off power management). The script says that it does not apply to my version of the EEPROM. When I run `ethtool -e (eth0|eth1)` I do not have the bit on 0x0010 that is set to "de", so I must believe that the script is correct in assessing that it does not apply to my NICs.

    Now, I am out of ideas, and I seem to have hit a brick wall. One of the things that disturbs me is that all of the articles I have found concerning this problem are dated 1-2 years ago.

    Can anyone offer me any assistance?
     
    Last edited: Nov 7, 2007
  2. DanTheBanjoman Señor Moderator

    Joined:
    May 20, 2004
    Messages:
    10,553 (2.77/day)
    Thanks Received:
    1,383
    "have created the following Python script which reliably reproduces the "Tx Unit Hang" bug in the e1000 driver on 82573(V/L/E) cards where power management is enabled (See Issues for more details.)"

    Where power management is enabled, why not turn power management off?
     
  3. SAlexson New Member

    Joined:
    Jun 12, 2007
    Messages:
    39 (0.01/day)
    Thanks Received:
    0
    Location:
    Connecticut
    That doesn't apply to the EEPROM version on my network cards. My cards postdate that fix, and Intel started sending out cards with the updated EEPROM (power management set to off) once the issue was identified. But regardless, I did try to disable power management with a script written by Intel to do so, and the script states that it does not apply to my EEPROM when I run it.
     
    Last edited: Nov 7, 2007
  4. SAlexson New Member

    Joined:
    Jun 12, 2007
    Messages:
    39 (0.01/day)
    Thanks Received:
    0
    Location:
    Connecticut
    Could it be that the power supply isn't supplying enough power to all the peripherals when I add the 4th memory module? It is a generic PSU that came with the case. I think it can't be more that 350W. I have a new one on the way, so I am curious if that may solve the problem.

    Any thoughts?
     
    Last edited: Nov 8, 2007
  5. DanTheBanjoman Señor Moderator

    Joined:
    May 20, 2004
    Messages:
    10,553 (2.77/day)
    Thanks Received:
    1,383
    Doubt it, remove some other piece of hardware to test though. It might be an addressing issue, can you try some 64bit OS and see if the problem still exists?
     
  6. SAlexson New Member

    Joined:
    Jun 12, 2007
    Messages:
    39 (0.01/day)
    Thanks Received:
    0
    Location:
    Connecticut
    Well, I am running the 64bit version of CentOS 5, so it isn't a limitation from the OS.

    I did try removing one of the network cards, and disconnecting the DVD drive. No change. There isn't really anything else to remove. So, I guess it isn't the PSU.
     
  7. SAlexson New Member

    Joined:
    Jun 12, 2007
    Messages:
    39 (0.01/day)
    Thanks Received:
    0
    Location:
    Connecticut
    Well, some further testing...

    This morning I removed everything from the system except the processor and the memory. I used the onboard video and the onboard Gigabit LAN (Realtek chipset). Booted up with 4GB and everything worked fine. Network came online with no errors. So, I think I can definitely rule out the PSU being the root of the problem.

    So, I added 1 of the Intel network cards (still using onboard video). I disabled the onboard LAN. When I booted, back to the same "Tx Unit Hang" error. Well, I guess the problem is with the driver or the card (though I don't think it is a "bad" card since I have 4 of them exhibiting the same problem).

    So, I broke down today. I have 4 new network cards coming. Netgear, which I looked up, and no one has had problems with them under Linux. The Netgear cards supposedly use the Realtek drivers (which the onboard LAN used). So, we'll see in a few days!

    If the new cards work, the only problem I will have is figuring out what to do with the 4 Intel cards since they will be useless to me.

    Thanks again for everyone's help!
     

Currently Active Users Viewing This Thread: 1 (0 members and 1 guest)

Share This Page