Mellanox Connectx2 Issues


Recommended Posts

Hello, First time ever posting on a forum so if i do anything incorrectly im sorry...But i have a strange issue pertaining to mellanox cards... i have 2 connectx-2 cards one is in my unraid server and the other is in my windows 10 machine... I got everything setup correctly made sure all drivers and firmware were up to date on both cards on both systems... I have the card on my unraid box setup with an ip of 10.10.10.1 and my windows machine is 10.10.10.2... every so often on the windows side of things the card will drop connection by stating network cable unplugged or the connection will disable itself... Thus causing a call trace on unraid forcing me to perform an unclean shutdown and then a parity check... I've went into power management and made sure windows cannot turn off the card to save power and that didnt fix anything... So last night i disconnected the sfp+ cable between both machines and a call trace still occurred making me believe the issue could possibly be a compatibility issue with the cards and unraid because mellanox support told me they do not support unraid as an OS which confuses me because i see some other members here on this forum seem to be using Connectx-2 cards without issue... Any help in the matter would be greatly appreciated.... Thank you...

Link to comment
  • Replies 83
  • Created
  • Last Reply

Top Posters In This Topic

Hmmm... Could i possibly have a defective card then? Because during a parity check that was just started at around 2am of last night another call trace was listed and the cause was my connectx-2 card... I just removed the card from the system so that my parity check would actually complete because im getting tired of all the unclean shutdowns... This is the 4th or 5th one this week that unraid is stating is being caused by the mellanox card....

Link to comment

Then maybe ill put it into another windows machine and see if the issue still occurs between two windows based computers to see if the card is bad... Kinda sucks because i wanted the 10GB connection for large file dumps where regular everyday gigabit connection is too slow...

Edited by Hannibal
wrong wording
Link to comment

Cisco twinax cables... I'm going to put the card I suspect is faulty into another windows machine to see if the issue is still there... But I just can't deal with the call traces it's causing.... I've had atleast 4 unclean shutdowns this week because of it forcing me into unnecessary parity checks... 

Link to comment
2 hours ago, Hannibal said:

Cisco twinax cables... I'm going to put the card I suspect is faulty into another windows machine to see if the issue is still there... But I just can't deal with the call traces it's causing.... I've had atleast 4 unclean shutdowns this week because of it forcing me into unnecessary parity checks... 

Id check the best practices for these cards and make sure your bios is configured correctly, https://community.mellanox.com/docs/DOC-2489

 

There are others out there, the above is a good example and starting point.

 

TL;DR some bios features can interfere with the card, also make sure your up to date on bios, i had to do some tricky things with my cards since they came pre-flashed with HP bios.

 

Tim

 

Link to comment

alright... I'm sorry if this is a stupid question but this is my very first time ever working with 10GB cards... How would i go about checking what bios is on the cards? I have removed the card from my unraid box just so i can get through a parity check without another call trace occurring.... I've also made sure my bios on both windows 10 and the unraid box are up to date... But i haven't when in and checked any bios settings on each system in depth or anything of that sort....

Link to comment
3 minutes ago, Hannibal said:

alright... I'm sorry if this is a stupid question but this is my very first time ever working with 10GB cards... How would i go about checking what bios is on the cards? I have removed the card from my unraid box just so i can get through a parity check without another call trace occurring.... I've also made sure my bios on both windows 10 and the unraid box are up to date... But i haven't when in and checked any bios settings on each system in depth or anything of that sort....

 

There are cli tools you can download, a good rubber meets the road post here:

 

https://community.mellanox.com/thread/1858

 

Tim

 

Link to comment
2 minutes ago, Hannibal said:

Ok, thank you for that... I have the mlxup tools but I believe those are just for firmware related issues..... 

Depending on how you got the cards don't be shocked in the firmware versions dont line up with mellanox proper firmwares, each var has their own "sauce" firmware on it, id recommend just reflashing with proper mellanox firmware.

Link to comment

According to this i am currently on the latest ever firmware ever published on mellanox's website.... Unless there is something im doing wrong here... Both cards produce the same thing posted below... Also according to the PSID they are not IBM or HP flashed cards....

 

 

 

Device #1:
----------

  Device Type:      ConnectX2
  Part Number:      MNPA19_A1-A3
  Description:      ConnectX-2 Lx EN network interface card; single-port SFP+; PCIe2.0 5.0GT/s; mem-free; RoHS R6
  PSID:             MT_0F60110010
  PCI Device Name:  mt26448_pci_cr0
  Port1 MAC:        6cb3114d0670
  Port2 MAC:        6cb3114d0671
  Versions:         Current        Available
     FW             2.9.1200       N/A
     PXE            3.3.0400       N/A

  Status:           No matching image found

Edited by Hannibal
Link to comment
14 hours ago, Hannibal said:

According to this i am currently on the latest ever firmware ever published on mellanox's website.... Unless there is something im doing wrong here... Both cards produce the same thing posted below... Also according to the PSID they are not IBM or HP flashed cards....

 

 

 

Device #1:
----------

  Device Type:      ConnectX2
  Part Number:      MNPA19_A1-A3
  Description:      ConnectX-2 Lx EN network interface card; single-port SFP+; PCIe2.0 5.0GT/s; mem-free; RoHS R6
  PSID:             MT_0F60110010
  PCI Device Name:  mt26448_pci_cr0
  Port1 MAC:        6cb3114d0670
  Port2 MAC:        6cb3114d0671
  Versions:         Current        Available
     FW             2.9.1200       N/A
     PXE            3.3.0400       N/A

  Status:           No matching image found

That looks right, my only other recommendation is looking into bios features and power savings that could be interfering.

 

Tim

 

Link to comment

alright, thank you... I was planning on putting the other card into a windows machine to see if the issue is still there... Because yesterday i removed the other mellanox card from my unraid system just so the parity check would complete... The results i posted in my previous post were from my windows based system... Is there any possibility the card i had in my unraid machine could be faulty? 

Link to comment
Just now, Hannibal said:

alright, thank you... I was planning on putting the other card into a windows machine to see if the issue is still there... Because yesterday i removed the other mellanox card from my unraid system just so the parity check would complete... The results i posted in my previous post were from my windows based system... Is there any possibility the card i had in my unraid machine could be faulty? 

Possible yes, for an enterprise card that ships as defective the odds are pretty low but are non-zero.  If you got a realtek nic or some cheap POS the likelihood of it being defective goes up.  Heck i had a defective twinax cable so anything is possible.

Link to comment

Because if i remember correctly i only paid like i think $88 dollars a pop for these mellanox cards off newegg.com.... I just want it to work without the call trace issues... Performing large multiple terabyte file dumps over regular old everyday gigabit NICs is painfully slow compared to the 10GB adapters....

Edited by Hannibal
Link to comment
18 hours ago, Hannibal said:

Cisco twinax cables

i have 2 mellanox connectx-2 and with my Cisco 7m ACTIVE DAC they refuses to connect at all. Brocade Active DACs working very well. i just finished my 10Gbit home network some weeks ago and this was my only incompatibility between all equipment.  

so, i would try to change cable for test.. what are a distance between your PCs?

Link to comment
15 minutes ago, Hannibal said:

Because if i remember correctly i only paid like i think $88 dollars a pop for these mellanox cards off newegg.com.... I just want it to work without the call trace issues... Performing large multiple terabyte file dumps over regular old everyday gigabit NICs is painfully slow compared to the 10GB adapters....

I hear ya, i bought mine used from amazon for $20 a card, got 4 of them in my house all working well besides the twinax acting a fool on startup sometimes, to lazy to replace with fibre.

 

Tim

 

Link to comment

My freenas server uses fibre and has zero issues, my windows 10, unraid servers both have issues on boot sometimes where the link never comes up, disabling the nic or pulling the twinax out and then back in usually resets it.  Never had the link drop mid flight like what your describing.  

 

 

Tim

 

 

Link to comment

Its strange.... That's why i pulled the twinax cable to see if the call trace would still occur and it did during a parity check... I think im gonna just throw the card i took from the unraid machine into a windows pc and mess around with it... Been contemplating just swapping my windows 10 pc over to linux to see if the issue is still there with two machines running linux... idk kinda at wits end here lol... Would there be any benefit to me switching over to fibre vs using the twinax cable? 

Edited by Hannibal
Link to comment
26 minutes ago, uldise said:

i have 2 mellanox connectx-2 and with my Cisco 7m ACTIVE DAC they refuses to connect at all. Brocade Active DACs working very well. i just finished my 10Gbit home network some weeks ago and this was my only incompatibility between all equipment.  

so, i would try to change cable for test.. what are a distance between your PCs?

The distance between PC's is less than id say 2ft the PC's that are connected with the cisco twinax cable are both in the same rack....

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.