6.3.1 stack dump during interrupt handling.

laterdaze · February 16, 2017

This has not happened before 6.3.1 upgrade, not that it means anything though. I've been running the Mellanox driver with a ConnectX-2 for quite a while. The BIOS has an update, will do that when I get a chance.

Feb 14 11:35:21 unRAID kernel: ------------[ cut here ]------------

Feb 14 11:35:21 unRAID kernel: WARNING: CPU: 2 PID: 14091 at net/core/skbuff.c:4313 skb_try_coalesce+0x22f/0x31d

Feb 14 11:35:21 unRAID kernel: Modules linked in: xt_nat veth vhost_net tun vhost macvtap macvlan kvm_intel kvm md_mod xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat mlx4_en mlx4_core ptp pps_core r8169 mii x86_pkg_temp_thermal coretemp ahci i2c_i801 i2c_smbus i2c_core libahci wmi video backlight [last unloaded: md_mod]

Feb 14 11:35:21 unRAID kernel: CPU: 2 PID: 14091 Comm: Threadpool work Not tainted 4.9.8-unRAID #1

Feb 14 11:35:21 unRAID kernel: Hardware name: System manufacturer System Product Name/P8H77-I, BIOS 0904 10/15/2012

Feb 14 11:35:21 unRAID kernel: ffff88021fb03af8 ffffffff813a34fa 0000000000000000 ffffffff819a7fee

Feb 14 11:35:21 unRAID kernel: ffff88021fb03b38 ffffffff8104d04c 000010d9132d9000 ffff8801e5d34d00

Feb 14 11:35:21 unRAID kernel: ffff8801e5d34f00 00000000000004c0 ffff88021fb03b94 0000000000000575

Feb 14 11:35:21 unRAID kernel: Call Trace:

Feb 14 11:35:21 unRAID kernel: <IRQ>

Feb 14 11:35:21 unRAID kernel: [<ffffffff813a34fa>] dump_stack+0x61/0x7e

Feb 14 11:35:21 unRAID kernel: [<ffffffff8104d04c>] __warn+0xb8/0xd3

Feb 14 11:35:21 unRAID kernel: [<ffffffff8104d114>] warn_slowpath_null+0x18/0x1a

Feb 14 11:35:21 unRAID kernel: [<ffffffff8157a3be>] skb_try_coalesce+0x22f/0x31d

Feb 14 11:35:21 unRAID kernel: [<ffffffff815e58b8>] tcp_try_coalesce+0x38/0x97

Feb 14 11:35:21 unRAID kernel: [<ffffffff815e5db4>] tcp_queue_rcv+0x5c/0x101

Feb 14 11:35:21 unRAID kernel: [<ffffffff815ea7bb>] tcp_rcv_established+0x2b2/0x5ac

Feb 14 11:35:21 unRAID kernel: [<ffffffff815f20ae>] tcp_v4_do_rcv+0x98/0x1c8

Feb 14 11:35:21 unRAID kernel: [<ffffffff815f47e7>] tcp_v4_rcv+0x8aa/0xaec

Feb 14 11:35:21 unRAID kernel: [<ffffffffa0025215>] ? ipv4_confirm+0x7a/0xd0 [nf_conntrack_ipv4]

Feb 14 11:35:21 unRAID kernel: [<ffffffff815d4faa>] ip_local_deliver_finish+0xf4/0x1c3

Feb 14 11:35:21 unRAID kernel: [<ffffffff815d5540>] ip_local_deliver+0xcc/0xe1

Feb 14 11:35:21 unRAID kernel: [<ffffffff815d4eb6>] ? inet_del_offload+0x40/0x40

Feb 14 11:35:21 unRAID kernel: [<ffffffff815d536d>] ip_rcv_finish+0x2f4/0x2ff

Feb 14 11:35:21 unRAID kernel: [<ffffffff815d5896>] ip_rcv+0x341/0x358

Feb 14 11:35:21 unRAID kernel: [<ffffffff815d5079>] ? ip_local_deliver_finish+0x1c3/0x1c3

Feb 14 11:35:21 unRAID kernel: [<ffffffff81586b4c>] __netif_receive_skb_core+0x5e9/0x69f

Feb 14 11:35:21 unRAID kernel: [<ffffffffa0377cd5>] ? mlx4_en_process_rx_cq+0x83e/0xa43 [mlx4_en]

Feb 14 11:35:21 unRAID kernel: [<ffffffff815871c6>] __netif_receive_skb+0x13/0x55

Feb 14 11:35:21 unRAID kernel: [<ffffffff81588124>] process_backlog+0xa1/0x13f

Feb 14 11:35:21 unRAID kernel: [<ffffffff81587f1f>] net_rx_action+0xe2/0x246

Feb 14 11:35:21 unRAID kernel: [<ffffffff81050eca>] __do_softirq+0xbb/0x1af

Feb 14 11:35:21 unRAID kernel: [<ffffffff8105116e>] irq_exit+0x53/0x94

Feb 14 11:35:21 unRAID kernel: [<ffffffff8102009e>] do_IRQ+0xaa/0xc2

Feb 14 11:35:21 unRAID kernel: [<ffffffff8167db42>] common_interrupt+0x82/0x82

Feb 14 11:35:21 unRAID kernel: <EOI>

Feb 14 11:35:21 unRAID kernel: ---[ end trace 790ce744c3e754ca ]---

unraid-diagnostics-20170215-0833.zip

John_M · February 16, 2017

See this thread. Do the BIOS update by all means but if that doesn't fix it your best bet is to roll back to the earlier kernel and wait for unRAID 6.3.2 to see if that helps.

laterdaze · February 16, 2017

Thanks for that. Meanwhile I perused the kernel sources and it seems that while trying to coalesce some socket buffers some math didn't add up so it pulled the chain. Better safe than sorry. That stack of kernel code seems like a well worn path so it will take some one with "enlightened foreprudence" to figure that out...

6.3.1 stack dump during interrupt handling.

Recommended Posts

laterdaze

Link to comment

John_M

Link to comment

laterdaze

Link to comment

Join the conversation