mkyb14

Members
  • Posts

    109
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

mkyb14's Achievements

Apprentice

Apprentice (3/14)

1

Reputation

  1. yes, ecc ram has run memtest for 48hrs previously, no issues there or with board. current bios for supermicro along with latest on all hba cards. burn in in safemode is fine, now that this is running, I'll setup plex again with the variables and then setup sab,sonarr,radarr and see what happens over the following week or so.
  2. hmm weird, reboot again just to make sure and now it shows....
  3. same result. remove plugin, re-install, reboot. plugins, screenshot tower-diagnostics-20240401-1459.zip
  4. I will remove the plugin and start over. I removed everything as I was having the system lock up randomly after updating to 6.12.8. no one could tell me why in other forums... so I booted to safe mode and slowly turned on dockers and then after that plugins and crashed after a while when I had enabled the nvidia one...so I removed everything to start over. give me like 5 mins and I'll redo the plugin again and post diagnostics again
  5. added. tower-diagnostics-20240401-1429.zip
  6. Looking for some direction here. I've had to remove all my plugins and docker.img. started adding all the dockers back and got to the nvidia plugin for my p2000. When doing the update, rebooting , I only get this info on the plugin page... I'm not sure what directory or file this is referring to. I can't seem to find anything googling it.
  7. I was digging through the logs, and saw this.... now I'm using the nvidia plugin and dockers have been running fine for 5years on this system. logs/nvidia-smi.txt NVIDIA-SMI couldn't find libnvidia-ml.so library in your system. Please make sure that the NVIDIA Display Driver is properly installed and present in your system. Please also try adding directory that contains libnvidia-ml.so to your system PATH. I can rip and replace the nvidia plugin and re-download the lastest stable drivers.
  8. I'll give that a shot, it happened then too un safemode after a day or two. I just scp all the log files off the usb drive and ran them through chatgpt and nothing came back as a hardware issue. At this point if it does in safemode again, I'm not sure what to do other than blow away the USB drive and rebuild? this only started happening after the update to unraid's latest version which also coinsided with a nvidia driver update for the system.
  9. thanks for taking a look and helping me figure out what's going on.
  10. also just to follow on, all smart checks on the HD's found no errors on every rust disk and ssd. I had then started the new config to get rid of the old 1tb bad drive that had no data on it in a new rebuild. It should have completed around 7 or 8pm based on the estimates 3/25 before losing the gui access.
  11. ok, so in the past few days I've enabled the syslog. I ended up doing a new config and resyncing the parity to get rid of a old drive (was added but not used) and said it had errors. During the rebuild, I was updating some dockers and lost the gui again. I have the syslog, but isn't there some sensitive material in there? RSA keys for ssh etc? Right now, dockers seem to be up, cannot kill them. I was able to kill vm manager and shut it down. No gui response and the way I'm understanding unraid is setup, I can't reset nginx etc to try and force restart the webserver side? I see this in the logs Mar 25 19:32:12 Tower kernel: general protection fault, probably for non-canonical address 0xefffffff81e42c70: 0000 [#1] PREEMPT SMP PTI Mar 25 19:32:12 Tower kernel: CPU: 1 PID: 27726 Comm: kworker/u16:3 Tainted: P O 6.1.74-Unraid #1 Mar 25 19:32:12 Tower kernel: Hardware name: Supermicro Super Server/X11SSH-LN4F, BIOS 2.7 12/07/2021 Mar 25 19:32:12 Tower kernel: Workqueue: writeback wb_workfn (flush-btrfs-5) Mar 25 19:32:12 Tower kernel: RIP: 0010:do_writepages+0xad/0x124 Mar 25 19:32:12 Tower kernel: Code: 00 00 4c 89 b3 00 01 00 00 48 85 c0 48 0f 48 c2 48 89 83 10 01 00 00 e8 96 e0 6c 00 49 8b 84 24 90 00 00 00 48 89 ee 4c 89 e7 <48> 8b 40 10 48 85 c0 74 07 ff d0 0f 1f 00 eb 05 e8 db e5 ff ff 83 Mar 25 19:32:12 Tower kernel: RSP: 0018:ffffc90025dbfc10 EFLAGS: 00010297 Mar 25 19:32:12 Tower kernel: RAX: efffffff81e42c60 RBX: ffff88821637c860 RCX: 0000000000000000 Mar 25 19:32:12 Tower kernel: RDX: 0000000107354800 RSI: ffffc90025dbfcb0 RDI: ffff88812be607e8 Mar 25 19:32:12 Tower kernel: RBP: ffffc90025dbfcb0 R08: ffffffff82206510 R09: ffffffffffffffff Mar 25 19:32:12 Tower kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff88812be607e8 Mar 25 19:32:12 Tower kernel: R13: ffff88812be607e8 R14: 0000000107354800 R15: ffff88821637d000 Mar 25 19:32:12 Tower kernel: FS: 0000000000000000(0000) GS:ffff888867a40000(0000) knlGS:0000000000000000 Mar 25 19:32:12 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 25 19:32:12 Tower kernel: CR2: 0000154f0619b000 CR3: 0000000175192006 CR4: 00000000003726e0 Mar 25 19:32:12 Tower kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 25 19:32:12 Tower kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Mar 25 19:32:12 Tower kernel: Call Trace: Mar 25 19:32:12 Tower kernel: <TASK> Mar 25 19:32:12 Tower kernel: ? __die_body+0x1a/0x5c Mar 25 19:32:12 Tower kernel: ? die_addr+0x38/0x51 Mar 25 19:32:12 Tower kernel: ? exc_general_protection+0x30f/0x345 Mar 25 19:32:12 Tower kernel: ? asm_exc_general_protection+0x22/0x30 Mar 25 19:32:12 Tower kernel: ? do_writepages+0xad/0x124 Mar 25 19:32:12 Tower kernel: __writeback_single_inode+0x7a/0x2cb Mar 25 19:32:12 Tower kernel: writeback_sb_inodes+0x24f/0x40f Mar 25 19:32:12 Tower kernel: __writeback_inodes_wb+0x82/0xc0 Mar 25 19:32:12 Tower kernel: wb_writeback+0x135/0x24a Mar 25 19:32:12 Tower kernel: wb_workfn+0x21a/0x39e Mar 25 19:32:12 Tower kernel: ? sched_clock_cpu+0x12/0xa1 Mar 25 19:32:12 Tower kernel: ? __smp_call_single_queue+0x23/0x35 Mar 25 19:32:12 Tower kernel: ? paravirt_write_msr+0xb/0x11 Mar 25 19:32:12 Tower kernel: ? ttwu_queue_wakelist+0x9a/0xcf Mar 25 19:32:12 Tower kernel: process_one_work+0x1a8/0x295 Mar 25 19:32:12 Tower kernel: worker_thread+0x18b/0x244 Mar 25 19:32:12 Tower kernel: ? rescuer_thread+0x281/0x281 Mar 25 19:32:12 Tower kernel: kthread+0xe4/0xef Mar 25 19:32:12 Tower kernel: ? kthread_complete_and_exit+0x1b/0x1b Mar 25 19:32:12 Tower kernel: ret_from_fork+0x1f/0x30 Mar 25 19:32:12 Tower kernel: </TASK> Mar 25 19:32:12 Tower kernel: Modules linked in: vhost_net tun vhost tap kvm_intel kvm md_mod cmac cifs asn1_decoder cifs_arc4 cifs_md4 dns_resolver veth xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet 8021q garp mrp bridge stp llc bonding tls igb intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp ast drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul drm_kms_helper crc32c_intel ghash_clmulni_intel ipmi_ssif sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel drm crypto_simd cryptd rapl intel_cstate Mar 25 19:32:12 Tower kernel: intel_uncore mpt3sas i2c_i801 agpgart syscopyarea acpi_ipmi i2c_smbus mei_me i2c_algo_bit sysfillrect ahci sysimgblt input_leds raid_class fb_sys_fops i2c_core joydev led_class libahci intel_pch_thermal scsi_transport_sas mei thermal fan ipmi_si video wmi backlight intel_pmc_core acpi_power_meter acpi_pad button unix [last unloaded: md_mod] Mar 25 19:32:12 Tower kernel: ---[ end trace 0000000000000000 ]--- Mar 25 19:32:12 Tower kernel: RIP: 0010:do_writepages+0xad/0x124 Mar 25 19:32:12 Tower kernel: Code: 00 00 4c 89 b3 00 01 00 00 48 85 c0 48 0f 48 c2 48 89 83 10 01 00 00 e8 96 e0 6c 00 49 8b 84 24 90 00 00 00 48 89 ee 4c 89 e7 <48> 8b 40 10 48 85 c0 74 07 ff d0 0f 1f 00 eb 05 e8 db e5 ff ff 83 Mar 25 19:32:12 Tower kernel: RSP: 0018:ffffc90025dbfc10 EFLAGS: 00010297 Mar 25 19:32:12 Tower kernel: RAX: efffffff81e42c60 RBX: ffff88821637c860 RCX: 0000000000000000 Mar 25 19:32:12 Tower kernel: RDX: 0000000107354800 RSI: ffffc90025dbfcb0 RDI: ffff88812be607e8 Mar 25 19:32:12 Tower kernel: RBP: ffffc90025dbfcb0 R08: ffffffff82206510 R09: ffffffffffffffff Mar 25 19:32:12 Tower kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff88812be607e8 Mar 25 19:32:12 Tower kernel: R13: ffff88812be607e8 R14: 0000000107354800 R15: ffff88821637d000 Mar 25 19:32:12 Tower kernel: FS: 0000000000000000(0000) GS:ffff888867a40000(0000) knlGS:0000000000000000 Mar 25 19:32:12 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 25 19:32:12 Tower kernel: CR2: 0000154f0619b000 CR3: 0000000175192006 CR4: 00000000003726e0 Mar 25 19:32:12 Tower kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 25 19:32:12 Tower kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Mar 25 19:32:12 Tower kernel: ------------[ cut here ]------------ Mar 25 19:32:12 Tower kernel: WARNING: CPU: 1 PID: 27726 at kernel/exit.c:814 do_exit+0x87/0x923 Mar 25 19:32:12 Tower kernel: Modules linked in: vhost_net tun vhost tap kvm_intel kvm md_mod cmac cifs asn1_decoder cifs_arc4 cifs_md4 dns_resolver veth xt_CHECKSUM ipt_REJECT nf_reject_ipv4 ip6table_mangle ip6table_nat iptable_mangle vhost_iotlb xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_addrtype br_netfilter xfs nfsd auth_rpcgss oid_registry lockd grace sunrpc zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) tcp_diag inet_diag ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet 8021q garp mrp bridge stp llc bonding tls igb intel_rapl_msr intel_rapl_common iosf_mbi x86_pkg_temp_thermal intel_powerclamp coretemp ast drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul crc32_pclmul drm_kms_helper crc32c_intel ghash_clmulni_intel ipmi_ssif sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel drm crypto_simd cryptd rapl intel_cstate Mar 25 19:32:12 Tower kernel: intel_uncore mpt3sas i2c_i801 agpgart syscopyarea acpi_ipmi i2c_smbus mei_me i2c_algo_bit sysfillrect ahci sysimgblt input_leds raid_class fb_sys_fops i2c_core joydev led_class libahci intel_pch_thermal scsi_transport_sas mei thermal fan ipmi_si video wmi backlight intel_pmc_core acpi_power_meter acpi_pad button unix [last unloaded: md_mod] Mar 25 19:32:12 Tower kernel: CPU: 1 PID: 27726 Comm: kworker/u16:3 Tainted: P D O 6.1.74-Unraid #1 Mar 25 19:32:12 Tower kernel: Hardware name: Supermicro Super Server/X11SSH-LN4F, BIOS 2.7 12/07/2021 Mar 25 19:32:12 Tower kernel: Workqueue: writeback wb_workfn (flush-btrfs-5) Mar 25 19:32:12 Tower kernel: RIP: 0010:do_exit+0x87/0x923 Mar 25 19:32:12 Tower kernel: Code: 24 74 04 75 13 b8 01 00 00 00 41 89 6c 24 60 48 c1 e0 22 49 89 44 24 70 4c 89 ef e8 31 ed 80 00 48 83 bb b0 07 00 00 00 74 02 <0f> 0b 48 8b bb d8 06 00 00 e8 33 ec 80 00 48 8b 83 d0 06 00 00 83 Mar 25 19:32:12 Tower kernel: RSP: 0018:ffffc90025dbfee0 EFLAGS: 00010286 Mar 25 19:32:12 Tower kernel: RAX: 0000000000000000 RBX: ffff8882f767e180 RCX: 0000000000000000 Mar 25 19:32:12 Tower kernel: RDX: 0000000000000001 RSI: 0000000000002710 RDI: 00000000ffffffff Mar 25 19:32:12 Tower kernel: RBP: 000000000000000b R08: 0000000000000000 R09: ffffffff82245f30 Mar 25 19:32:12 Tower kernel: R10: 00007fffffffffff R11: ffffffff8296f001 R12: ffff8881f9a0c400 Mar 25 19:32:12 Tower kernel: R13: ffff888184d34200 R14: 0000000000000000 R15: 0000000000000000 Mar 25 19:32:12 Tower kernel: FS: 0000000000000000(0000) GS:ffff888867a40000(0000) knlGS:0000000000000000 Mar 25 19:32:12 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 25 19:32:12 Tower kernel: CR2: 0000154f0619b000 CR3: 0000000175192006 CR4: 00000000003726e0 Mar 25 19:32:12 Tower kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 25 19:32:12 Tower kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Mar 25 19:32:12 Tower kernel: Call Trace: Mar 25 19:32:12 Tower kernel: <TASK> Mar 25 19:32:12 Tower kernel: ? __warn+0xab/0x122 Mar 25 19:32:12 Tower kernel: ? report_bug+0x109/0x17e Mar 25 19:32:12 Tower kernel: ? do_exit+0x87/0x923 Mar 25 19:32:12 Tower kernel: ? handle_bug+0x41/0x6f Mar 25 19:32:12 Tower kernel: ? exc_invalid_op+0x13/0x60 Mar 25 19:32:12 Tower kernel: ? asm_exc_invalid_op+0x16/0x20 Mar 25 19:32:12 Tower kernel: ? do_exit+0x87/0x923 Mar 25 19:32:12 Tower kernel: ? worker_thread+0x18b/0x244 Mar 25 19:32:12 Tower kernel: make_task_dead+0x11c/0x11c Mar 25 19:32:12 Tower kernel: rewind_stack_and_make_dead+0x17/0x17 Mar 25 19:32:12 Tower kernel: RIP: 0000:0x0 Mar 25 19:32:12 Tower kernel: Code: Unable to access opcode bytes at 0xffffffffffffffd6. Mar 25 19:32:12 Tower kernel: RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000 Mar 25 19:32:12 Tower kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 Mar 25 19:32:12 Tower kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 Mar 25 19:32:12 Tower kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 Mar 25 19:32:12 Tower kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 Mar 25 19:32:12 Tower kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 Mar 25 19:32:12 Tower kernel: </TASK> Mar 25 19:32:12 Tower kernel: ---[ end trace 0000000000000000 ]--- Ma Then this in the morning Mar 26 06:17:17 Tower nginx: 2024/03/26 06:17:17 [error] 7356#7356: *1479212 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 192.168.22.15, server: , request: "GET /Dashboard/Main/Settings/Device?name=disk6 HTTP/1.1", subrequest: "/auth-request.php", upstream: "fastcgi://unix:/var/run/php5-fpm.sock", host: "192.168.22.20" Mar 26 06:17:17 Tower nginx: 2024/03/26 06:17:17 [error] 7356#7356: *1479212 auth request unexpected status: 504 while sending to client, client: 192.168.22.15, server: , request: "GET /Dashboard/Main/Settings/Device?name=disk6 HTTP/1.1", host: "192.168.22.20"
  12. It dawned on me looking at the logs page, that I had a 4 nic board setup in LACP and having the 4 nics tied up to the single unraid IP assigned might be an issue..... so I removed the LACP bond and on my switch and removed the br0 bond route.... this fixed it and I was able to nmap port 514 udp and see it open now. I will log this to another physical box (proxmox running librenms) and do the flash copy and post them here when this happens next.
  13. tried to do a parity check last night and it just hangs, everything locks up and there's no way to know what's happening since I can't login to it and have to use the drac to hard power cycle it. What are my options here to help diagnose what this is?
  14. I can't even reboot though, gui or ssh, hangs. I can hard reset it, just the last time that happened it borked the cache drives and had to redo all dockers etc. Guess that's my only option at this point. It's still functioning in the background, transcoding shows etc, I just can't keep the gui or ssh to do certain commands after the update.