[Plugin] CA Fix Common Problems


Recommended Posts

Another item to check, if you are interested, is the ps report, checking for processes that are monopolizing the CPU.  I admit I don't have a lot of Linux experience, but it's hard for me to believe that any process should ever be using more than 90% of the CPU.  I cringe when I see users reporting that something is using 100% CPU.  Things don't work right if that's true.  Interrupts won't be handled in a timely way, and that causes spurious timeouts and freezes in devices and processes that are otherwise completely fine, except they are starved for CPU attention.  Helpers can then be mislead by these timeouts and subsequent errors, when all that device or process needed was some CPU love.  The real problem is the runaway process hogging the CPU.

 

A potential future fix would be to display the offending process, and request permission to kill it.

In my opinion, any process can use 100% (although realistically, it would always wind up being less than that) if it needs it.  Not necessarily an error or a warning as if the process wants it legitimately then it should have it.  That, and I don't want to get into pointing out that such and such process is using high CPU as it might only encourage users to kill that process (which may actually be needed)  (and I'm certainly not going to ever offer up the ability to kill a process for the same reasons)

 

That being said, I think that a better metric would be sysload.  Right off the hop, its an average over 1 min, 5 min, 15 min and lets the user know that they are running too much stuff concurrently for their CPU

 

Maybe something like

(sysload / # cores) > 2 generates a warning
(sysload / # cores) > 4 generates an error

Link to comment

Here's a big one *if* you are interested.  A number of users with problems appear to have a full /var/log, causing additional problems for them.  You are checking and reporting if it's filling up, but it would be really great if you could *truncate* the huge space wasters, and instantly give them some more space and time.  It could possibly enable diagnostics collection, and powerdown, and any other safe procedures they might want to run, perhaps to diagnose further what's wrong.  This is of course an action, a fix, not just a detection and report.

 

We discussed this around here, and you questioned the loss of logging info after truncation, a valid quibble.  I don't know if you still feel the same, but I still feel that data past the first 500KB is totally useless, not even worth saving, unless someone can come up with a valid reason why it might be useful.  jonathanm's subsequent remark was apropos, although I'd rather not save the truncated garbage.  Burn it!

 

I do still think it could be a big help, allow them to continue a bit longer, long enough to grab full diagnostics.  And the very fact it worked (if it does!) allows helpers to narrow the real problem down sooner.  I'd loop through the files there, and truncate any that are larger than 5MB down to 2MB.  Those are my suggested numbers, change as you like.

When I get around to auto fixes, then sure.  But since there is an underlying issue at play here the action (along with any auto fix) would to post your diagnostics asap.

 

But I'm thinking that since any fix is a temporary fix at best (unless your uptime is from when you were running unRaid v2) that a better way would be to automatically expand /var/log instead of truncating it.)

 

And even there, the problem is that if background checks only run say once a day the odds are good that you're screwed before it does a scan, as its rather easy for a continual error being logged to completely fill the log fs within an hour or so.

 

I'll think about it some more...

Link to comment

Under the criteria  it is a valid error.  However since its from bonienl it can be considered a false positive.

 

This is why this is a warning and not an error.  Only way to do things to get rid of the 6.0 plugins

 

While bonienl may disagree my opinion is that any plugin not published in CA is potentially an issue.  But this is a special circumstance .

 

(i actually tested this routine by installing that plugin as I knew it wouldn't cause any issues)

 

Sent from my LG-D852 using Tapatalk

 

I never gave this particular plugin an official status, so your tool is right. Better warn people than to ignore.

 

I foresee a temporary life for this plugin anyway. The issue is corrected in unRAID 6.2 and it can be considered to backport fixes into unRAID 6.1 as well (ultimately this is a LT decision).

If it wasn't for that plugin, this actually would have been an error.  If / when the fix is released in a GUI update, then I'll probably upgrade the status.
Link to comment

Another thing to consider as part of the fixes is a warning about Adblockers, potentially interfering with the operation of the webGUI.

 

If you are interested, I developed some detection code at the time (but it was never implemented in the webGUI) and show a warning when a running adblocker is detected. You could do something similar in your plugin, which would be a more appropriate place to do.

Can anyone give me a link to an ad blocker that messes up unRaid.  I don't run them as I don't have any real issues with them, and tried an extension for chrome but it still worked AFAIK
Link to comment

Beer is not the answer...

Beer is the question.

"Yes" is the answer

 

Ability to downgrade the auto update errors of dynamix GUI and this plugin to warnings (create a file called config/plugins/fix.common.problems/autoupdate-warning on the flash drive to do this.  file just has to exist)

 

Today's update to CA fixed the false positive of dynamix webGUI and CA auto update errors

Thanks to bonienl, added in ad blocker detection.  -> doesn't run in the background checks - only from the UI

Added in illegal and/or should be illegal characters in share names *

 

* under ideal circumstances, the entire contents of your drives should be checked for illegal characters, but since that's extremely time consuming, I stop at just the share names.  They are based upon the most restrictive major OS out there (Windows), and are treated as errors, not warnings since they also have the ability to mess up SMB

 

They are:

 

\/:*?"<>| 

 

Additionally, folders ending in a "." are also illegal, along with folder names containing only spaces.

 

Interestingly enough, I found out that control characters are LEGAL under linux, MAC, and Windows, but since you're going to have such a bitch of a time using them, I made them errors also

 

There may be other unicode characters that can mess up samba, but these checks at least handle what are at least easy (or somewhat easy) to create under unRaid.

 

 

Also some bug fixes where some of the docker checks weren't running

Link to comment

Other checks that occur to me are:

  • Check for duplicate files on shares.  Often happens if users copy (not move) files between disk shares.  I have seen user scripts in the forum for detecting them, so suggest seeing if the authors are happy for you to incoporate them into your plugin
  • Check for files/directories owned by root on user share
  • Zero length files on user shares.  Whilst there could be reasons to have zero length files, I suspect in most cases they are indications of previous problems creating files .... Such as trying to copy files from a disk share to a user share.

 

As these are checks that might take a while you might want to add them under an 'Additional Checks' section.

Link to comment

Hi, I got this error today for a few of my dockers:

 

Docker Application DuckDNS, Container Port not found or changed on installed application

With this description:

When changing ports on a docker container, you should only ever modify the HOST port, as the application in question will expect the container port to remain the same as what the template author dictated. Fix this here

 

I don't think I ever changed the ports and all works fine. Might be a red herring?

The DuckDNS container runs in host mode and does not have any ports mapped.

 

For Nginx-letsencrypt it complained that port 80 is missing. But I don't really need one, right?

Docker Application Nginx-letsencrypt, Container Port 80 not found or changed on installed application

 

EDIT: running unRAID 6.1.8

Link to comment

Hi, I got this error today for a few of my dockers:

 

Docker Application DuckDNS, Container Port not found or changed on installed application

With this description:

When changing ports on a docker container, you should only ever modify the HOST port, as the application in question will expect the container port to remain the same as what the template author dictated. Fix this here

 

I don't think I ever changed the ports and all works fine. Might be a red herring?

The DuckDNS container runs in host mode and does not have any ports mapped.

 

For Nginx-letsencrypt it complained that port 80 is missing. But I don't really need one, right?

Docker Application Nginx-letsencrypt, Container Port 80 not found or changed on installed application

 

EDIT: running unRAID 6.1.8

DuckDNS - actually runs in bridge mode, but with no ports defined.  Fixed on next rev.

 

Host mode was already skipped

 

Nginx - The criteria is that the port is defined in the template, and if you change the container port (not the host port) or delete it then the error appears.  What you're probably doing is using port 443 on deleted 80.  Just add 80 back in and define it to something.  (the container description talks about nothing but port 443, so maybe its a template error, but as far the error goes, it stands as is)

Link to comment


Other checks that occur to me are:
  • Check for duplicate files on shares.  Often happens if users copy (not move) files between disk shares.  I have seen user scripts in the forum for detecting them, so suggest seeing if the authors are happy for you to incoporate them into your plugin

Good idea

  • Check for files/directories owned by root on user share

good idea

  • Zero length files on user shares.  Whilst there could be reasons to have zero length files, I suspect in most cases they are indications of previous problems creating files .... Such as trying to copy files from a disk share to a user share.

Top of my head, don't want to do that as for instance, appdata / crashplan detinations are user shares, and there *may* be a ton of them in there.  EG: this plug and CA both use them extensively as a quick "flag" to remember something in between iterations of the programs.  Decent idea, but I don't want to get into a situation where there's going to be a ton of legitimate false positives.  I'll think about it.

As these are checks that might take a while you might want to add them under an 'Additional Checks' section.

very good idea

Link to comment

My DuckDNS is in HOST mode and the error still there.

Maybe the template is BRIDGE but it is running in HOST, could be another check ....

 

Nginx-letsencrypt

I mapped port 80 to something and the error goes away as you said.

But I don't want to map port 80. Might raise this in aptalca's support thread.

 

Thanks for your quick answer and all the good work.

Link to comment

My DuckDNS is in HOST mode and the error still there.

Will probably get fixed under the same fix.

Maybe the template is BRIDGE but it is running in HOST, could be another check ....

ok.  just not sure if there are very valid reasons to be doing this
Link to comment

Maybe the template is BRIDGE but it is running in HOST, could be another check ....

ok.  just not sure if there are very valid reasons to be doing this

Yes, not sure either why I have it like this. Might be a left over from testing or an old template? I changed it now.

But a nice pick-up by your plugin.

Link to comment

Just to clarify, duckdns does not need any ports mapped and the template has it set to bridge (although not sure if it will even make a difference if it's set to host because the container does not open any ports, really)

 

Nginx-letsencrypt has port 80 in the template because it is optional. Letsencrypt uses port 443 for validation so that is required, but the basic webserver can still use port 80 for non-ssl connections. If you don't use it, just set it to any random port and forget it.

Link to comment

Ability to downgrade the auto update errors of dynamix GUI and this plugin to warnings (create a file called config/plugins/fix.common.problems/autoupdate-warning on the flash drive to do this.  file just has to exist)

 

That's great. Many thanks.

 

Interestingly enough, I found out that control characters are LEGAL under linux, MAC, and Windows, but since you're going to have such a bitch of a time using them, I made them errors also

 

In Mac OS X a folder with a custom icon contains a file that has a control character in its name. It's system-generated and can't be changed and it can only be avoided by ensuring that all folders use the generic icon. This caused problems with the Dynamix file integrity plugin and required a specific exclusion. See here: http://lime-technology.com/forum/index.php?topic=44989.msg436925#msg436925

 

I also have a problem with Mac OS X folder icons, which are stored in files with the name "Icon\r" - yes, that's a five character file name ending in a carriage return (ASCII 0x0D) character! Volume icons are not a problem because they are stored in .VolumeIcon.icns files, which is a manageable name and the contents are static. Reference: http://superuser.com/questions/298785/icon-file-on-os-x-desktop

Link to comment

Ability to downgrade the auto update errors of dynamix GUI and this plugin to warnings (create a file called config/plugins/fix.common.problems/autoupdate-warning on the flash drive to do this.  file just has to exist)

 

That's great. Many thanks.

 

Interestingly enough, I found out that control characters are LEGAL under linux, MAC, and Windows, but since you're going to have such a bitch of a time using them, I made them errors also

 

In Mac OS X a folder with a custom icon contains a file that has a control character in its name. It's system-generated and can't be changed and it can only be avoided by ensuring that all folders use the generic icon. This caused problems with the Dynamix file integrity plugin and required a specific exclusion. See here: http://lime-technology.com/forum/index.php?topic=44989.msg436925#msg436925

 

I also have a problem with Mac OS X folder icons, which are stored in files with the name "Icon\r" - yes, that's a five character file name ending in a carriage return (ASCII 0x0D) character! Volume icons are not a problem because they are stored in .VolumeIcon.icns files, which is a manageable name and the contents are static. Reference: http://superuser.com/questions/298785/icon-file-on-os-x-desktop

Fair enough.  Since they're such a royal pain to even create I'll remove that test

 

Sent from my LG-D852 using Tapatalk

 

 

Link to comment

One more thing, not sure if you want it to be included in your checks.

 

I have this in my system log

kernel: ata9.00: HPA detected: current 1465147055, native 1465149168

which I'm well aware of, it won't easily get disabled on the drive (tried), and the BIOS setting which likely caused this doesn't apply to my current board (plus this drive is on the short list of being replaced).

 

However, for some users of old hardware (since this isn't "a thing" really anymore) it may be worth identifying and reporting, as the consequence(s) of HPA and a drive changing in size is a bad thing for UnRaid/parity protection.

Link to comment

I have a warning for a share which is array only that states there are files on the cache drive, however this is not the case.

Screenshot attached of the warning, and SSH ls of the contents of my cache drive.

I'll check it out and fix it on tonights update

 

Link to comment

I have a warning for a share which is array only that states there are files on the cache drive, however this is not the case.

Screenshot attached of the warning, and SSH ls of the contents of my cache drive.

I'll check it out and fix it on tonights update

Thanks!

Link to comment

One more thing, not sure if you want it to be included in your checks.

 

I have this in my system log

kernel: ata9.00: HPA detected: current 1465147055, native 1465149168

which I'm well aware of, it won't easily get disabled on the drive (tried), and the BIOS setting which likely caused this doesn't apply to my current board (plus this drive is on the short list of being replaced).

 

However, for some users of old hardware (since this isn't "a thing" really anymore) it may be worth identifying and reporting, as the consequence(s) of HPA and a drive changing in size is a bad thing for UnRaid/parity protection.

Only if its on the parity drive
Link to comment

One more thing, not sure if you want it to be included in your checks.

 

I have this in my system log

kernel: ata9.00: HPA detected: current 1465147055, native 1465149168

which I'm well aware of, it won't easily get disabled on the drive (tried), and the BIOS setting which likely caused this doesn't apply to my current board (plus this drive is on the short list of being replaced).

 

However, for some users of old hardware (since this isn't "a thing" really anymore) it may be worth identifying and reporting, as the consequence(s) of HPA and a drive changing in size is a bad thing for UnRaid/parity protection.

Only if its on the parity drive

 

I had thought that an array drive changing size was also bad for the calculated parity. Well either way, just thought I'd mention it.

Link to comment

One more thing, not sure if you want it to be included in your checks.

 

I have this in my system log

kernel: ata9.00: HPA detected: current 1465147055, native 1465149168

which I'm well aware of, it won't easily get disabled on the drive (tried), and the BIOS setting which likely caused this doesn't apply to my current board (plus this drive is on the short list of being replaced).

 

However, for some users of old hardware (since this isn't "a thing" really anymore) it may be worth identifying and reporting, as the consequence(s) of HPA and a drive changing in size is a bad thing for UnRaid/parity protection.

Only if its on the parity drive

 

I had thought that an array drive changing size was also bad for the calculated parity. Well either way, just thought I'd mention it.

At that point the damage is done, and you may wind up with a single corrupted file.  The major issue is if parity winds up getting it, because at that point parity is no longer the largest drive, and in theory you cannot rebuild any drive at all.
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.