UNRAID randomly crashing. Need Help.


Recommended Posts

Starting yesterday, I've had a problem where UNRAID would randomly crash.  I can still access the files over Samba, but can't get to the main UI or any of my plugin or Docker plugin UIs.  The only change I made prior to this starting was removing plugins (Couchpotato and Headphones) and installing a Docker plugin (Couchpotato).  The last time it happened, I captured the syslog and found a line that said kernal BUG.  Below is that section:

 

Mar 12 04:19:45 Tower kernel: ------------[ cut here ]------------
Mar 12 04:19:45 Tower kernel: kernel BUG at fs/reiserfs/journal.c:507!
Mar 12 04:19:45 Tower kernel: invalid opcode: 0000 [#1] PREEMPT SMP 
Mar 12 04:19:45 Tower kernel: Modules linked in: xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat md_mod mvsas libsas e1000 pata_atiixp i2c_piix4 scsi_transport_sas ahci libahci k10temp acpi_cpufreq [last unloaded: md_mod]
Mar 12 04:19:45 Tower kernel: CPU: 1 PID: 9572 Comm: shfs Not tainted 4.1.18-unRAID #1
Mar 12 04:19:45 Tower kernel: Hardware name: MSI MS-7623/740GM-P25 (MS-7623) , BIOS V2.1 04/04/2010
Mar 12 04:19:45 Tower kernel: task: ffff880097d98000 ti: ffff880048e48000 task.ti: ffff880048e48000
Mar 12 04:19:45 Tower kernel: RIP: 0010:[<ffffffff8116ce58>]  [<ffffffff8116ce58>] reiserfs_in_journal+0x134/0x13f
Mar 12 04:19:45 Tower kernel: RSP: 0018:ffff880048e4b8f8  EFLAGS: 00010246
Mar 12 04:19:45 Tower kernel: RAX: ffffc9000182cad0 RBX: 0000000000001f3f RCX: 000000000b651f3f
Mar 12 04:19:45 Tower kernel: RDX: 000000000b651f3f RSI: 00000005b28f9f80 RDI: 0000000000000000
Mar 12 04:19:45 Tower kernel: RBP: ffff880048e4b918 R08: ffff880048e4b964 R09: ffff8800c832a000
Mar 12 04:19:45 Tower kernel: R10: ffffc90001780000 R11: 000000000000b650 R12: ffff880102dead00
Mar 12 04:19:45 Tower kernel: R13: ffffc9000177cb28 R14: 0000000000001f40 R15: ffff880048e4ba14
Mar 12 04:19:45 Tower kernel: FS:  00002ab873a85700(0000) GS:ffff88011fc40000(0000) knlGS:0000000000000000
Mar 12 04:19:45 Tower kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 12 04:19:45 Tower kernel: CR2: 00002ab873e8d000 CR3: 00000000cbb5f000 CR4: 00000000000006e0
Mar 12 04:19:45 Tower kernel: Stack:
Mar 12 04:19:45 Tower kernel: ffff8800c832a000 0000000000000001 ffff8800c832a000 ffff880102dead00
Mar 12 04:19:45 Tower kernel: ffff880048e4b998 ffffffff81153b34 ffff880097d98000 0000000000008000
Mar 12 04:19:45 Tower kernel: 0000001148e4b958 ffff88000388fbc0 0000000100001d7b 000016ca00008000
Mar 12 04:19:45 Tower kernel: Call Trace:
Mar 12 04:19:45 Tower kernel: [<ffffffff81153b34>] scan_bitmap_block.constprop.9+0xeb/0x244
Mar 12 04:19:45 Tower kernel: [<ffffffff811544b6>] reiserfs_allocate_blocknrs+0x829/0xa1e
Mar 12 04:19:45 Tower kernel: [<ffffffff8115b546>] reiserfs_get_block+0x59d/0xecc
Mar 12 04:19:45 Tower kernel: [<ffffffff81124450>] __block_write_begin+0x159/0x321
Mar 12 04:19:45 Tower kernel: [<ffffffff8115afa9>] ? reiserfs_commit_write+0x169/0x169
Mar 12 04:19:45 Tower kernel: [<ffffffff810b9212>] ? wait_for_stable_page+0x15/0x34
Mar 12 04:19:45 Tower kernel: [<ffffffff81159d70>] reiserfs_write_begin+0x100/0x1bc
Mar 12 04:19:45 Tower kernel: [<ffffffff810b1990>] generic_perform_write+0xd1/0x185
Mar 12 04:19:45 Tower kernel: [<ffffffff810b2a1f>] __generic_file_write_iter+0x9f/0x147
Mar 12 04:19:45 Tower kernel: [<ffffffff810b2bd9>] generic_file_write_iter+0x112/0x17d
Mar 12 04:19:45 Tower kernel: [<ffffffff810fd986>] __vfs_write+0x8f/0xb8
Mar 12 04:19:45 Tower kernel: [<ffffffff810fdf05>] vfs_write+0xad/0x165
Mar 12 04:19:45 Tower kernel: [<ffffffff810fe76a>] SyS_pwrite64+0x50/0x81
Mar 12 04:19:45 Tower kernel: [<ffffffff815f7b2e>] system_call_fastpath+0x12/0x71
Mar 12 04:19:45 Tower kernel: Code: 1f 00 00 49 8b 84 c2 98 01 00 00 48 85 c0 74 1d 8b 48 10 48 39 d1 75 06 4c 39 48 08 74 0d 48 8b 40 40 eb e7 b8 01 00 00 00 eb 02 <0f> 0b 41 59 41 5a 5b 41 5c 5d c3 55 48 89 e5 53 48 89 fb 50 48 
Mar 12 04:19:45 Tower kernel: RIP  [<ffffffff8116ce58>] reiserfs_in_journal+0x134/0x13f
Mar 12 04:19:45 Tower kernel: RSP <ffff880048e4b8f8>
Mar 12 04:19:45 Tower kernel: ---[ end trace a3da855b164e4d04 ]---

I've also attached the syslog. Thanks.

syslog

Link to comment

All of the scans on my data and cache drive returned "No corruptions found". However, the scan on my parity drive returned the following:

 

root@Tower:~# reiserfsck --check /dev/sdi1
reiserfsck 3.6.24

Will read-only check consistency of the filesystem on /dev/sdi1
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
###########
reiserfsck --check started at Sun Mar 12 17:29:28 2017
###########
Replaying journal: No transactions found
Checking internal tree..  block 460305198: The level of the node (52720) is not correct, (4) expected
 the problem in the internal node occured (460305198), whole subtree is skipped
finished
Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.
Bad nodes were found, Semantic pass skipped
1 found corruptions can be fixed only when running with --rebuild-tree
###########
reiserfsck finished at Sun Mar 12 17:31:18 2017
###########

Should I do the --rebuild-tree then redo my parity after it?  Do I need to replace the drive?

Link to comment
13 minutes ago, johnnie.black said:

Parity has no file system.

You should use the md device when checking or repairing a disk in the array, not the sd device. Maybe you did it correctly when you did your data disks, but just thought I would mention it in case you didn't or someone else reads this.

 

And ss johnnie said, it makes no sense to check or repair filesystem on parity.

Link to comment
20 minutes ago, trurl said:

You should use the md device when checking or repairing a disk in the array, not the sd device. Maybe you did it correctly when you did your data disks, but just thought I would mention it in case you didn't or someone else reads this.

 

And ss johnnie said, it makes no sense to check or repair filesystem on parity.

It is worse than that - any attempt to do so will probably corrupt parity.

Link to comment

I check all my data drives by using /dev/mdx. Below is the results for one of them.  They all looked like this one.

 

root@Tower:~# reiserfsck --check /dev/md1
reiserfsck 3.6.24

Will read-only check consistency of the filesystem on /dev/md1
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
###########
reiserfsck --check started at Sun Mar 12 05:35:19 2017
###########
Replaying journal: Done.
Reiserfs journal '/dev/md1' in blocks [18..8211]: 0 transactions replayed

Message from syslogd@Tower at Mar 12 05:37:24 ...
 kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1
Checking internal tree..  finished
Comparing bitmaps..finished
Checking Semantic tree:
finished
No corruptions found
There are on the filesystem:
	Leaves 462405
	Internal nodes 2962
	Directories 3040
	Other files 35538
	Data block pointers 465168995 (0 of them are zero)
	Safe links 0
###########
reiserfsck finished at Sun Mar 12 05:59:25 2017
###########

So, if I don't need to run reiserfsck on parity and it's fine, what's the next thing to try?

Link to comment

Okay, so I just restarted my array and got a different error:

 

Mar 12 17:59:42 Tower kernel: ------------[ cut here ]------------
Mar 12 17:59:42 Tower kernel: WARNING: CPU: 1 PID: 8557 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x4d/0x10e()
Mar 12 17:59:42 Tower kernel: BTRFS: Transaction aborted (error -5)
Mar 12 17:59:42 Tower kernel: Modules linked in: md_mod xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat e1000 pata_atiixp i2c_piix4 ahci libahci mvsas libsas scsi_transport_sas k10temp acpi_cpufreq [last unloaded: md_mod]
Mar 12 17:59:42 Tower kernel: CPU: 1 PID: 8557 Comm: btrfs-transacti Not tainted 4.1.18-unRAID #1
Mar 12 17:59:42 Tower kernel: Hardware name: MSI MS-7623/740GM-P25 (MS-7623) , BIOS V2.1 04/04/2010
Mar 12 17:59:42 Tower kernel: 0000000000000009 ffff880016837c58 ffffffff815f2403 ffffffff8182e140
Mar 12 17:59:42 Tower kernel: 0000000000000292 ffff880016837c58 ffff880016837ca8 ffff880016837c98
Mar 12 17:59:42 Tower kernel: ffffffff8104778b ffff880016837ce8 ffffffff81286d44 00000000fffffffb
Mar 12 17:59:42 Tower kernel: Call Trace:
Mar 12 17:59:42 Tower kernel: [<ffffffff815f2403>] dump_stack+0x65/0x85
Mar 12 17:59:42 Tower kernel: [<ffffffff8104778b>] warn_slowpath_common+0x97/0xb1
Mar 12 17:59:42 Tower kernel: [<ffffffff81286d44>] ? __btrfs_abort_transaction+0x4d/0x10e
Mar 12 17:59:42 Tower kernel: [<ffffffff810477e6>] warn_slowpath_fmt+0x41/0x43
Mar 12 17:59:42 Tower kernel: [<ffffffff81286d44>] __btrfs_abort_transaction+0x4d/0x10e
Mar 12 17:59:42 Tower kernel: [<ffffffff812ac3ad>] cleanup_transaction+0x80/0x21d
Mar 12 17:59:42 Tower kernel: [<ffffffff810724fb>] ? wait_woken+0x7d/0x7d
Mar 12 17:59:42 Tower kernel: [<ffffffff812ad617>] btrfs_commit_transaction+0xa6c/0xa81
Mar 12 17:59:42 Tower kernel: [<ffffffff812a93ce>] transaction_kthread+0xfa/0x1cb
Mar 12 17:59:42 Tower kernel: [<ffffffff812a92d4>] ? btrfs_cleanup_transaction+0x461/0x461
Mar 12 17:59:42 Tower kernel: [<ffffffff8105c74a>] kthread+0xd6/0xde
Mar 12 17:59:42 Tower kernel: [<ffffffff8105c674>] ? kthread_create_on_node+0x172/0x172
Mar 12 17:59:42 Tower kernel: [<ffffffff815f7f12>] ret_from_fork+0x42/0x70
Mar 12 17:59:42 Tower kernel: [<ffffffff8105c674>] ? kthread_create_on_node+0x172/0x172
Mar 12 17:59:42 Tower kernel: ---[ end trace 201b7be815cddbe6 ]---

followed by a bunch of lines saying that my cache drive is a read-only filesystem. syslog is attached.

syslog

Link to comment

Did you check your cache disk? Looks like it has reiserfs corruption, and when it went read only cause the btrfs errors on the docker image.
 

Mar 12 17:59:33 Tower kernel: REISERFS error (device sdc1): vs-4080 _reiserfs_free_block: block 61085904: bit already cleared
Mar 12 17:59:33 Tower kernel: REISERFS (device sdc1): Remounting filesystem read-only

 

Link to comment

I checked it initially, but it passed with the below output:

 

root@Tower:~# reiserfsck --check /dev/sdc1
reiserfsck 3.6.24

Will read-only check consistency of the filesystem on /dev/sdc1
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
###########
reiserfsck --check started at Sun Mar 12 16:28:49 2017
###########
Replaying journal: Done.
Reiserfs journal '/dev/sdc1' in blocks [18..8211]: 0 transactions replayed
Checking internal tree..  finished
Comparing bitmaps..finished
Checking Semantic tree:
finished
No corruptions found
There are on the filesystem:
	Leaves 88989
	Internal nodes 607
	Directories 476072
	Other files 307144
	Data block pointers 35614529 (14967 of them are zero)
	Safe links 0
###########
reiserfsck finished at Sun Mar 12 16:37:15 2017
###########

However, after it went read-only, I ran it against and got the below output:

root@Tower:~# reiserfsck --check /dev/sdc1
reiserfsck 3.6.24

Will read-only check consistency of the filesystem on /dev/sdc1
Will put log info to 'stdout'

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes
###########
reiserfsck --check started at Sun Mar 12 20:56:14 2017
###########
Filesystem seems mounted read-only. Skipping journal replay.
Checking internal tree..  finished
Comparing bitmaps..vpf-10640: The on-disk and the correct bitmaps differs.
Checking Semantic tree:
... t.2016.1080p.BluRay.DTS-HD.MA.5.1.x264-FuzerHD.cp(tt4975722).#159/6.out.tmpvpf-10680: The file [3478210 3478654] has the wrong block count in the StatData (0), should be (8)
finished
2 found corruptions can be fixed when running with --fix-fixable
###########
reiserfsck finished at Sun Mar 12 21:00:49 2017
###########

Will running reiserfsck with --fix-fixable fix everything?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.