Trying to install any App causes unRAID to become unresponsive.


Recommended Posts

Initially, I got a weird behavior from Community Applications, where clicking on a container to install would take me to the setup screen of a different container. As I was trying to fix the issue, now clicking on the install button for any container makes unRAID immediately unresponsive. I am able to browse through the CA listings though, as long as I don't click install.

Is there a way to fix this by reinstalling the CA plugin? I would really like to install a few other docker containers.
Thanks!

Link to comment

First off, you are under attack from Vietnam, Russia, and Great Britain (damn brits).  Take your server out of your router's DMZ.  There is zero reason to have it in there.  (Also make sure you set up notifications on your server.  Fix Common Problems will alert you to any future attacks)

 

The crashing:  It started with this:

Apr 17 09:15:21 APURVA-SERVER kernel: kernel BUG at fs/btrfs/ctree.h:1763!
Apr 17 09:15:21 APURVA-SERVER kernel: invalid opcode: 0000 [#1] PREEMPT SMP
Apr 17 09:15:21 APURVA-SERVER kernel: Modules linked in: xt_CHECKSUM iptable_mangle ipt_REJECT nf_reject_ipv4 ebtable_filter ebtables vhost_net vhost macvtap macvlan veth xt_nat iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 nf_nat ip_tables md_mod tun vmw_vmci bonding e1000e ptp pps_core r8152 mii coretemp kvm_intel kvm i2c_i801 i2c_smbus i2c_core ata_piix sata_sil24 intel_agp intel_gtt agpgart video backlight acpi_cpufreq [last unloaded: md_mod]
Apr 17 09:15:21 APURVA-SERVER kernel: CPU: 2 PID: 27287 Comm: btrfs-transacti Not tainted 4.9.19-unRAID #1
Apr 17 09:15:21 APURVA-SERVER kernel: Hardware name: HCL Infosystems Limited HCL Desktop/DQ57TM, BIOS TMIBX10H.86A.0042.2011.0120.1116 01/20/2011
Apr 17 09:15:21 APURVA-SERVER kernel: task: ffff8803937dc500 task.stack: ffffc9000ee50000
Apr 17 09:15:21 APURVA-SERVER kernel: RIP: 0010:[<ffffffff812d406b>]  [<ffffffff812d406b>] lookup_inline_extent_backref+0x3a1/0x53b
Apr 17 09:15:21 APURVA-SERVER kernel: RSP: 0018:ffffc9000ee53b60  EFLAGS: 00010283
Apr 17 09:15:21 APURVA-SERVER kernel: RAX: 0000000000000032 RBX: ffff8803729f0af0 RCX: 0000000000003000
Apr 17 09:15:21 APURVA-SERVER kernel: RDX: ffffffffffffd000 RSI: ffff88026dcbf800 RDI: ffff88026dcbc000
Apr 17 09:15:21 APURVA-SERVER kernel: RBP: ffffc9000ee53c00 R08: 0000000000004000 R09: ffffc9000ee53b20
Apr 17 09:15:21 APURVA-SERVER kernel: R10: 0000000000000000 R11: 0000000000000003 R12: 0000000000000032
Apr 17 09:15:21 APURVA-SERVER kernel: R13: 0000000000003783 R14: 00000000000000b2 R15: 0000000000000000
Apr 17 09:15:21 APURVA-SERVER kernel: FS:  0000000000000000(0000) GS:ffff88041b280000(0000) knlGS:0000000000000000
Apr 17 09:15:21 APURVA-SERVER kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 17 09:15:21 APURVA-SERVER kernel: CR2: 00007ffe8284b020 CR3: 00000003b38e9000 CR4: 00000000000006e0
Apr 17 09:15:21 APURVA-SERVER kernel: Stack:

 

All that can be dicerned is that the file system is btrfs.  This limits the cause to either the cache drive or the docker.img

 

You should Check Disk Filesystem on your cache drive.  If that gets you nowhere, then delete the docker.img and re-set it up.

 

 

Edited by Squid
  • Upvote 2
Link to comment

Thanks a shoal, Squid! I was too lazy to individually forward ports hence the DMZ, but have changed it now :)
Fix common problems had alerted me to the attacks, with 15000 or so failed login attempts, lolz. However, won't the attacks still continue on these open ports that I individually forward? I must be missing something here because I fail to see the difference between the two situations.

For fixing the CA/Docker crashes, I have started the file system disk check. Will continue on to recreating docker.img if need be. I will update as soon as I get some results.
Thanks for your help and super fast response!

Link to comment
Thanks a shoal, Squid! I was too lazy to individually forward ports hence the DMZ, but have changed it now Fix common problems had alerted me to the attacks, with 15000 or so failed login attempts, lolz. However, won't the attacks still continue on these open ports that I individually forward? I must be missing something here because I fail to see the difference between the two situations.

 

For fixing the CA/Docker crashes, I have started the file system disk check. Will continue on to recreating docker.img if need be. I will update as soon as I get some results.

Thanks for your help and super fast response!

 

You only need to forward the ports the apps themselves might require. And there won't be many. Port forwarding is only used for the outside world to communicate with an app. Not for an app to communicate with the outside world. You never forward port 22 or 80. If you require ssh access outside your network use a vpn

 

Sent from my LG-D852 using Tapatalk

 

 

 

 

 

  • Upvote 2
Link to comment

Dear Squid,

Upon starting the array in maintainance mode, and clicking on the 'Check' button in the Check filesystem status>btrfs check status, my output is as follows:
 

Quote

checking extents
parent transid verify failed on 611209625600 wanted 84208 found 82358
parent transid verify failed on 611209625600 wanted 84208 found 82358
parent transid verify failed on 611220848640 wanted 84214 found 82363
parent transid verify failed on 611220848640 wanted 84214 found 82363
parent transid verify failed on 611224649728 wanted 84221 found 82110
parent transid verify failed on 611224649728 wanted 84221 found 82110
parent transid verify failed on 611224698880 wanted 84221 found 82110
parent transid verify failed on 611224698880 wanted 84221 found 82110
parent transid verify failed on 611229925376 wanted 83355 found 82368
parent transid verify failed on 611229925376 wanted 83355 found 82368
parent transid verify failed on 611416965120 wanted 83410 found 82110
parent transid verify failed on 611416965120 wanted 83410 found 82110
parent transid verify failed on 611455909888 wanted 84397 found 82110
parent transid verify failed on 611455909888 wanted 84397 found 82110
parent transid verify failed on 611455991808 wanted 84397 found 82110
parent transid verify failed on 611455991808 wanted 84397 found 82110
parent transid verify failed on 611456008192 wanted 84397 found 82110
parent transid verify failed on 611456008192 wanted 84397 found 82110
parent transid verify failed on 611456024576 wanted 84397 found 82110
parent transid verify failed on 611456024576 wanted 84397 found 82110
parent transid verify failed on 611456040960 wanted 84397 found 82110
parent transid verify failed on 611456040960 wanted 84397 found 82110
parent transid verify failed on 611456139264 wanted 84397 found 82110
parent transid verify failed on 611456139264 wanted 84397 found 82110
parent transid verify failed on 611456303104 wanted 84397 found 82398
parent transid verify failed on 611456303104 wanted 84397 found 82398
parent transid verify failed on 611456401408 wanted 84397 found 82110
parent transid verify failed on 611456401408 wanted 84397 found 82110
parent transid verify failed on 611456516096 wanted 84397 found 82398
parent transid verify failed on 611456516096 wanted 84397 found 82398
parent transid verify failed on 611456532480 wanted 84397 found 82398
parent transid verify failed on 611456532480 wanted 84397 found 82398
parent transid verify failed on 611456647168 wanted 84397 found 82398
parent transid verify failed on 611456647168 wanted 84397 found 82398
parent transid verify failed on 611456663552 wanted 84397 found 82398
parent transid verify failed on 611456663552 wanted 84397 found 82398
parent transid verify failed on 611456958464 wanted 84397 found 82110
parent transid verify failed on 611456958464 wanted 84397 found 82110
parent transid verify failed on 611457089536 wanted 84397 found 82110
parent transid verify failed on 611457089536 wanted 84397 found 82110
parent transid verify failed on 611457171456 wanted 84397 found 82110
parent transid verify failed on 611457171456 wanted 84397 found 82110
parent transid verify failed on 611457335296 wanted 84397 found 82110
parent transid verify failed on 611457335296 wanted 84397 found 82110
parent transid verify failed on 611457597440 wanted 84397 found 82110
parent transid verify failed on 611457597440 wanted 84397 found 82110
parent transid verify failed on 611457613824 wanted 84397 found 82110
parent transid verify failed on 611457613824 wanted 84397 found 82110
parent transid verify failed on 611457695744 wanted 84397 found 82110
parent transid verify failed on 611457695744 wanted 84397 found 82110
parent transid verify failed on 611457859584 wanted 84397 found 82110
parent transid verify failed on 611457859584 wanted 84397 found 82110
parent transid verify failed on 611458383872 wanted 84397 found 82398
parent transid verify failed on 611458383872 wanted 84397 found 82398
parent transid verify failed on 611641065472 wanted 83450 found 82447
parent transid verify failed on 611641065472 wanted 83450 found 82447
parent transid verify failed on 611680436224 wanted 84431 found 82178
parent transid verify failed on 611680436224 wanted 84431 found 82178
parent transid verify failed on 611680780288 wanted 84431 found 82458
parent transid verify failed on 611680780288 wanted 84431 found 82458
parent transid verify failed on 611741368320 wanted 84438 found 82477
parent transid verify failed on 611741368320 wanted 84438 found 82477
parent transid verify failed on 611798892544 wanted 83474 found 82492
parent transid verify failed on 611798892544 wanted 83474 found 82492
parent transid verify failed on 611899310080 wanted 83500 found 82536
parent transid verify failed on 611899310080 wanted 83500 found 82536
parent transid verify failed on 611899588608 wanted 83500 found 82537
parent transid verify failed on 611899588608 wanted 83500 found 82537
parent transid verify failed on 611899719680 wanted 83500 found 82539
parent transid verify failed on 611899719680 wanted 83500 found 82539
parent transid verify failed on 611899850752 wanted 83500 found 82538
parent transid verify failed on 611899850752 wanted 83500 found 82538
parent transid verify failed on 611899932672 wanted 83500 found 82540
parent transid verify failed on 611899932672 wanted 83500 found 82540
parent transid verify failed on 611902980096 wanted 83500 found 82541
parent transid verify failed on 611902980096 wanted 83500 found 82541
parent transid verify failed on 611903111168 wanted 83500 found 82541
parent transid verify failed on 611903111168 wanted 83500 found 82541
parent transid verify failed on 611903242240 wanted 83500 found 82541
parent transid verify failed on 611903242240 wanted 83500 found 82541
parent transid verify failed on 611903471616 wanted 83500 found 82542
parent transid verify failed on 611903471616 wanted 83500 found 82542
parent transid verify failed on 611903488000 wanted 83500 found 82542
parent transid verify failed on 611903488000 wanted 83500 found 82542
parent transid verify failed on 611903520768 wanted 83500 found 82542
parent transid verify failed on 611903520768 wanted 83500 found 82542
parent transid verify failed on 611903602688 wanted 83500 found 82542
parent transid verify failed on 611903602688 wanted 83500 found 82542
parent transid verify failed on 611903635456 wanted 83500 found 82542
parent transid verify failed on 611903635456 wanted 83500 found 82542
parent transid verify failed on 611903651840 wanted 83500 found 82542
parent transid verify failed on 611903651840 wanted 83500 found 82542
parent transid verify failed on 611903750144 wanted 83500 found 82542
parent transid verify failed on 611903750144 wanted 83500 found 82542
parent transid verify failed on 611903766528 wanted 83500 found 82542
parent transid verify failed on 611903766528 wanted 83500 found 82542
parent transid verify failed on 611904012288 wanted 83500 found 82542
parent transid verify failed on 611904012288 wanted 83500 found 82542
parent transid verify failed on 611905093632 wanted 83500 found 82542
parent transid verify failed on 611905093632 wanted 83500 found 82542
 

I see no suggestions, as the guide suggests to start the repair by inputting additional commands in the textbox and clicking check button again to start repairs.
What could be the cause of this failure, and how should I fix it? I have also attached the logs as I performed the btrfs check after putting fix common problems in troubleshooting mode.

FCPsyslog_tail.txt

apurva-server-diagnostics-20170419-1055.zip

unbalance.log

Link to comment
14 minutes ago, apurvasukant said:

not forward port 80 as well. If I don't open port 80 then how can the web server operate?

It's not necessary to forward any port in your router to allow any computer on your LOCAL network to access any port. You shouldn't try to access your unRAID webUI from outside your local network without setting up a VPN.

 

  • Upvote 1
Link to comment
On 19/04/2017 at 6:39 AM, apurvasukant said:

parent transid verify failed on 611905093632 wanted 83500 found 82542

 

This means the metadata doesn't match the data, i.e., metadata was written but due to device errors or an unclean shutdown not all corresponding data was written, since it's an SSD and if there were no unclean shutdowns it's usually a cable problem.

 

btrfs fsck is not yet very reliable, best way to fix this kind of error is to backup your cache, reformat and restore the data.

  • Upvote 1
Link to comment
14 hours ago, trurl said:

It's not necessary to forward any port in your router to allow any computer on your LOCAL network to access any port. You shouldn't try to access your unRAID webUI from outside your local network without setting up a VPN.

 

 

Thanks for your suggestions trurl! I am hosting a website on Nginx webserver on port 80 for the public. If I don't open port 80, then how can there be public access to the website, not just local. It would be impossible for public to get on my VPN, just to access my website.

Perhaps I should put the port 80 behind some kind of 'web firewall'?

Link to comment
9 hours ago, johnnie.black said:

 

This means the metadata doesn't match the data, i.e., metadata was written but due to device errors or an unclean shutdown not all corresponding data was written, since it's an SSD and if there were no unclean shutdowns it's usually a cable problem.

 

btrfs fsck is not yet very reliable, best way to fix this kind of error is to backup your cache, reformat and restore the data.

 

Thanks Johnnie, I have had several unclean shutdowns, and that must be the cause. However, won't such errors be fixed again in a Parity Sync, or does that only work for data drives and not cache?

I will search the forum for instructions on how to backup, format and restore cache data, but is there any other way to achieve this as well?

Link to comment
On 4/21/2017 at 0:08 PM, johnnie.black said:

You can use the replace cache procedure, but instead of replacing just format and restore to the same device:

 

 

 
 

Johnnie I followed your instruction for replacing cache procedure and restored to the same SSD. This didn't solve the issue I had, though started a couple others.
A set of phaze's plugins that I was using, with their storage on the cache drive under a share called persistent storage have all become uninstalled.
Now my winSCP is not able to connect to the server as well whereas it was working great all this time. The only good news is that my files are intact :)

I have deleted all my dockers and VMs, but I would like to completely reset my unRAID's settings while keeping the files, how can I do that? The idea in my head is some kind of factory reset or fresh installation while keeping the data. I would really like to start anew, and sort this problem out at the very roots. What should I do with my cache pool of two 128 and 180 GB SSDs? How should I format(?)/preclear them?

I have just purchased a big UPS, so unclean shutdowns and data corruption are solved.
I will also not be using plugins for plex, couch and sickrage now, and set them up in docker.

Thanks!

  • Upvote 1
Link to comment

The flashdrive is plugged into USB2, I will get a new drive. Please give me some advice on:
 

Quote

I have deleted all my dockers and VMs, but I would like to completely reset my unRAID's settings while keeping the files, how can I do that? The idea in my head is some kind of factory reset or fresh installation while keeping the data. I would really like to start anew, and sort this problem out at the very roots. What should I do with my cache pool of two 128 and 180 GB SSDs? How should I format(?)/preclear them?

1

 

Link to comment
16 minutes ago, apurvasukant said:

The flashdrive is plugged into USB2, I will get a new drive. Please give me some advice on:

Take a screenshot of your drive assignments. Prepare the new flash as for a new install. Reassign your drives. You can just delete everything from cache if you want, since you won't have any dockers/VMs setup yet that will try to use anything on cache.

  • Upvote 1
Link to comment

Thanks trurl! I will take a screenshot of the drive assignments and format the unRAID flashdrive. I will also delete everything from the cache drive (any way to format/preclear?).
I don't understand what you mean by reassign your drives. Do you mean, when I recopy unRAID on the formatted flash drive and boot from there, to reassign the disks as per the original screenshot? Thanks!

Edited by apurvasukant
Link to comment

If you must forward ports without a proper VPN at least use non-standard ports on the WAN side.  It won't help under a full port scan, but first wave is typically a scan of the common ports.  If you are being specifically targeted the non-common ports will still be found.

  • Upvote 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.