Need Help Unable to boot using UnRaid USB


Recommended Posts

I had UnRaid running on 16 core 32 thread 128 GB RAM on monster PC.

While I appreciate everything this PC did for me, it was power hungry.

 

To save $ without loosing data and protection, I decided to move this to low power computer.

So, I built replacement PC and tested it by installing Windows 10 in Mini PCIE SSD using USB bootable media. I confirmed all hardware including booting from USB worked. After doing stress test when I was happy with newly built PC's stability and performance, I decided to move UnRaid to this computer.

 

I gracefully power down above mentioned Monster PC and pulled out USB Drive for UnRaid and put it in newly built low power PC.

I used same USB slot where i had put in another bootable USB drive for Windows install earlier. I got disappointed when UnRaid didn't boot.

I had verified and confirmed that USB stick for UnRaid was top priority in boot order and all other devices were disabled in boot order list.

 

I thought may be bad port and I tried all 4 available USB ports one by one and no luck booting Unraid from any of them.

Drive was getting detected each time. Now, I was more concerned.

So, I put that USB back into old monster PC and it worked.

 

One thing I did notice is that USB drive for Unraid shows as UEFI : Samsung USB <Serial number of UnRaid USB Drive>

 

Would you please advice on what can I do to ensure newly built PC can boot into UnRaid?

I would really appreciate any assistance on this issue.

Edited by curiouskid
Link to comment

I assume that you are trying to boot 6.4.0-rcX.  IF this is the case, look on your flash drive and rename the EFI folder to something like EFI.NotUsed

 

You might also like to read this thread:

You can also look on the boot order menu of your BIOS for an option for your USB drive that does NOT include UEFI in the description. 

 

You might want to read this post and the next few posts about this issue:

And this issue is being address in the latest release of the 6.4.0-rcX release series.  se the first post here:

 

Link to comment

When I opened ticket with mother board manufacturer, they asked me to take photo of each BIOS setting and send to them.

 

In the process, I learn that under Advanced Menu ->CSM Setting -> boot option filter was set to "UEFI only".

While I was preparing response for them, I notice this.

 

So, I quickly changed it to "UEFI and Legacy" which allowed boot. (See BIOS photo attached).

In short, My drive was following legacy MBR boot and motherboard was looking for UEFI bootable media.

Hence, it was not booting. After above change, I could boot without any issues.

 

However, for this troubleshooting, I had power cycled system multiple time and often not so gracefully.

As result one of my data disk in array went bad. Luckily it was not Parity drive.

 

So, I followed https://wiki.lime-technology.com/Troubleshooting#Re-enable_the_drive and confirmed it passes SMART test and started rebuild.

It's 6 TB drive with about 850 GB  data in it. so, rebuild is seem to take forever. It seems once I have fired rebuild disk and start array, GUI stops responding.

 

Frank1940 could you please advice if there is a way to monitor rebuild process via command line? How do I know if rebuild is going well? Is there a log file about drive rebuild that I can look for?

IMG_0917.JPG

Link to comment

I don't why the GUI has stopped responding but understand that when a Disk is rebuilt, EVERY  sector is rebuilt not just the ones with data so the total rebuilt time will probably be about the same as parity check time.

 

I don't know of a way to verify the progress of the rebuilt process via the command line but someone else may. 

 

You might run   diagnostics   and that will write the Diagnostics file to the logs directory of the Flash Drive.   IF you can get the file off of the Flash Drive, you might attach it to a new post. 

Edited by Frank1940
Fixed spelling of "way" in second paragraph.
Link to comment
18 minutes ago, Frank1940 said:

I don't know of a why to verify the progress of the rebuilt process via the command line but someone else may. 

 

 

cat /var/local/emhttp/var.ini | grep "mdResync"

 

One of the resulting lines is mdResyncPos which can be compared to mdResyncSize which will give you the current position vs how far it has to go.

Link to comment
5 hours ago, curiouskid said:

As result one of my data disk in array went bad. Luckily it was not Parity drive.

Not sure why you would think this. ALL disks are required to rebuild a disk. So parity is not any more important than any of the others. In fact, you could argue that parity is less important, since it doesn't actually contain any of your data.

Link to comment

Thank you trurl for correcting me. I had oversimplified understanding/misunderstanding of Parity drive and it's role in re-construction of data.

Because of your post, I carefully studied https://wiki.lime-technology.com/Parity  and there I noticed, following.

 

"At these times, all of the disks (including parity) are read to reconstruct the data to be written to the target disk. As the sum of the bits is always even, unRAID can reconstruct any ONE missing piece of data (the parity or a data disk), as long as the other pieces are correct."

 

Could you please confirm if extended understanding of long sentence above is correct.

 

1) Only one drive's failure can be tolerated for reconstruction of data.

2) All other drives in array should be present and readable. 

3) Pre-condition for recovery of that one drive is that other drives should not have any issues.

 

which means as UnRaid owner I must pay attention to error notification very seriously because fault tolerance window is very small.

If I do not act quickly to replace faulty drive quickly and in between if second drive run into any issues, there is no chance of recovery.

  • Upvote 1
Link to comment

RIGHT.  You have a firm grasp of the situation.  But not all apparent disk failures are true disk failures.  For example, in replacing a bad disk, a connector to another disk may be displaced which results in an condition which would most likely appear to the casual observer to be another defective disk.  And there are several other examples which some other folks could put forward.

 

You are also correct in that everyone should have setup the 'Notifications Settings' and look at the resulting e-mail everyday to see that things are OK.  You really want to address  problems as they come up.  If you wait until you can't even reach some files before you check on the condition of your server, you have probably lost data. 

Link to comment
11 minutes ago, curiouskid said:

1) Only one drive's failure can be tolerated for reconstruction of data.

2) All other drives in array should be present and readable. 

3) Pre-condition for recovery of that one drive is that other drives should not have any issues.

 

which means as UnRaid owner I must pay attention to error notification very seriously because fault tolerance window is very small.

If I do not act quickly to replace faulty drive quickly and in between if second drive run into any issues, there is no chance of recovery.

YES!!! Exactly!

 

The only thing to add is that a 2nd parity drive extends this to 2 simultaneous failures, but with the same conditions. The parity drive does not use some magical data compression that can reconstruct any data on demand.

Link to comment
58 minutes ago, curiouskid said:

I had oversimplified understanding/misunderstanding of Parity drive

Frank1940 and jonathanm pretty much said everything I would have said.

 

I am curious though about what your previous (mis)understanding was. Did you think you would be able to recover all of the data from a failed drive using only the parity drive?

 

If you think about it for a moment, there is no way the parity drive could contain the data for any drive. Since there is no way to know which drive might fail, it would have to contain all the data for every drive if it were possible to recover a disk using only the parity drive. Obviously parity doesn't have the capacity for this.

 

Now that you understand this, I think you are in a much better place than quite a lot of unRAID users. Knowing how parity works makes a lot of things about using unRAID much more understandable, and you are less likely to make mistakes.

Link to comment

Frank1940 

After about 20 mins of silence with crossed finger, I got GUI back. Which now reads.


Current Status:
Total size:6 TB
Elapsed time:17 hours, 57 minutes
Current position:2.05 TB (34.2 %)
Estimated speed:32.6 MB/sec
Estimated finish:1 day, 9 hours, 35 minutes 

I will write update on recovery after 1 day and 10 hours.

For notification I am using Boxcar Notification Agent.
Free notifications on phone app from delivered almost real-time from UnRaid.
Only problem is that I get too many of those even for green status.
After I get my drive back, I will extend research to get only one important ones.


Lesson from 1 data disk failure,

1) Read notification and act on them. Prerequisite is that I get few.
2) Do not do anything in panic. It can hurt my data.
3) When in doubt ask community.

4) Accept the fact that just like humans, data loss could occur due to many reasons and in many ways and I can get protection only from few of those reasons.
5) Get an off-site backup. Which I do not have at the moment.

Squid

 

Thank you for response. mdResyncPos/mdResyncSize * 100 = % of completion.

I am proposing if you could write command line version of everything that Web-UI is doing or showing?
May be a shell script with few command line argument about part of Web-UI in clear text on console.


That way no Web-UI condition would not provoke novice user like me to push reset button in the moment of panic.
This time I didn't do it because I knew drive rebuild is in progress.

trurl
 

12 hours ago, trurl said:

I am curious though about what your previous (mis)understanding was. Did you think you would be able to recover all of the data from a failed drive using only the parity drive?

 

Yes. I raise white flag for that. Forgive me for my ignorance.
 

12 hours ago, trurl said:

Now that you understand this, I think you are in a much better place than quite a lot of unRAID users. Knowing how parity works makes a lot of things about using unRAID much more understandable, and you are less likely to make mistakes.

 

Yes. Thank you.

 

Once again thank you for helping. I truly appreciate everyone's attention to detail in this community.

Link to comment

jonathanm

Thank you for additional point.  
 

15 hours ago, jonathanm said:

The only thing to add is that a 2nd parity drive extends this to 2 simultaneous failures, but with the same conditions.


Above line got me reading more about Dual Parity. I enjoyed Further discussion#1 thread till page#3 as I could not follow all that was discussed from there on. Because I am infra guy and my programming skill is limited to small scripts and have no base for advanced mathematics to understand formula for data recovery.
 

Anyway, as per WIKI
 

Quote

Dual parity

For large arrays, ‘dual parity’ – or, the facility to have a second parity disc that is not simply a mirror of the first – would be useful. This would permit two simultaneous drive failures without losing data. unRAID does not have dual parity at present, but ‘P + Q redundancy’ is part of the future roadmap.(Dead Link)

In a P + Q redundancy system (as in a RAID-6 system), there would be two redundancy disks: ‘P’, which is the ordinary XOR parity, and ‘Q’, which is a Reed-Solomon code. This would allow unRAID to recover from any 2 disk errors, with minimal impact on performance.

Further discussion: [1], [2]


Part highlighted in RED got me confused because I do see slot for 2nd parity drive in Web-UI of current stable release of UnRaid ver 6.3.5.


For consumer with simple use case mentioned below which is better option?

A. 2nd Parity assuming it's Q Drive with Reed-Solomon code implemented correctly.

B. Hot Spare which can be used to replace parity or data disk quickly.

C. Buy Cloud Backup with hope that they give me my-data when needed.

Instead of showing fine print of user agreement about their limited liability about their inability.
Even some of top cloud backup provider's customer review suggest that when they tried restore, it didn't work or it was incredibly slow where complete recovery would take months.


Use Case: Very limited at this point.

1) Only user who writes.

2) A Share named VMs - For ESX_VM_Backup.

3) A Share named ISOs - For repository of ISOs.

4) A Share named TimeMachine - To backup Mac.

4) 3 Share named Data, Audio Video - To store 3 type of data.
5) Docker image running Plex.

Link to comment
4 hours ago, curiouskid said:


Part highlighted in RED got me confused because I do see slot for 2nd parity drive in Web-UI of current stable release of UnRaid ver 6.3.5.

 

The WIKI's are seldom completely up to date.  Many of them are maintained by the user base and, in some cases, the originator has left the scene or is inactive for a variety of reasons.  Dual Parity is now a fact and if you want to read about the protection level that it provides, you can do so here:

     

There is considerable discussion on the topic in this thread and it can help you make a informed decision about dual parity. 

 

Hot spares are another place where it is largely up to you.  I know that there are a few people with them.  I would suspect that there are a lot more with a precleared drive sitting on the shelf waiting for a future drive failure.  The reason for this is that sometimes they will have two (or more) servers and having a drive ready to go is more economical solution then have one in every server!   Plus, you are burning through the warranty period without even having the drive in use. 

 

As the article above points out, data loss due physical damage to the server itself is a real issue and must always be take into account!   Cloud Backup is possibility but the realities of moving TB's of data is the real issue here.  A better alternative is to backup to the cloud (or use some other offsite backup scheme), that data which is completely unreplaceable!  (Family photographs and personal financial records are examples...)  Movies and TV shows are replaceable and maybe even expendable!

 

I, personally, use three 2.5" USB drives to make backups of my unreplaceable data.  Two of them are always in a safety deposit box at a bank a few miles away.  There is also a backup on one of my servers.  And my servers are now read only for all Windows computers to provide protection from Malware.   So I have a minimum of three backups of most of my most valuable data. 

Edited by Frank1940
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.