Home > An Unrecoverable > An Unrecoverable System Error Nmi Has Occurred
An Unrecoverable System Error Nmi Has Occurred
NMI is received when system is idle. and a echo "A" | socat - UNIX-CONNECT:/var/rund/watchdog-muxClick to expand... It is recommended that servers that require the ability to respond to an iLO-triggered NMI be updated to SLES 11 SP2 or later.https://www.suse.com/company/press/2012/2/suse-linux-enterprise-11-service-pack-2-released.htmlorhttp://www.novell.com/support/kb/doc.php?id=7012368Additional resource on related information can be obtained from:Release Here is what I am seeing in the iLO logs. Source
iLO2 firmware is upgraded to 2.29 (07/16/2015) Maybe this helps someone to assist. #2 mensinck, Oct 21, 2015 t.lamprecht Proxmox Staff Member Staff Member Joined: Jul 28, 2015 Messages: 544 Main Menu LQ Calendar LQ Rules LQ Sitemap Site FAQ View New Posts View Latest Posts Zero Reply Threads LQ Wiki Most Wanted Jeremy's Blog Report LQ Bug Syndicate Latest Unfortunately we do not have the trace due to HP's dammed ILO :-( but I will give mor Info when catched it up. If the OS locks up hard, watchdog timers (if configured) would eventually trigger an NMI. https://access.redhat.com/solutions/1309033
An Unrecoverable System Error Nmi Has Occurred Hp
Rafael David Tinoco (inaddy) wrote on 2015-04-07: #11 Doing verification right now... HP was advised by Canonical regarding Intel Errata # and that recommended workaround is a fix in firmware. And what about non-corosync configurations? We have a ceph cluster with 3 hosts, 3 monitors up and running on this lab and erverything seems to be quite good.
but it's a bit different, you are right. #14 pipomambo, Nov 11, 2015 adamb Member Proxmox VE Subscriber Joined: Mar 1, 2012 Messages: 777 Likes Received: 3 pipomambo said: ↑ Please test the kernel and update this bug with the results. I found some hints googling around. - blacklisting hpwdt was suggested but not the solution for VE, since we need the watchdog interfaces. - I also tried grub parameters: -- noautogroup An Unrecoverable System Error (nmi) Has Occurred (service Information: 0x7fbce8f6, 0x00000000) The server keeps crashing with always the same error messagesCritical PCI Bus 03/13/2013 17:12 03/13/2013 17:12 1 Uncorrectable PCI Express Error (Embedded device, Bus 0, Device 2, Function 2, Error status
Search this Thread 06-02-2014, 03:17 AM #1 kaito.7 LQ Newbie Registered: Jun 2014 Posts: 6 Rep: BL460c G8 host unexpectedly reset Hi all, We have a HP BL460c G8 An Unrecoverable System Error Nmi Has Occurred Dl585 Doesn't sound quite like the same issue. https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1432837 But, It could be a problem of ilo configuration when watchdog is enable by hpwdt. In one lab we have HP proliant servers with massive kernel panic on Module hpwdt.ko.
System Firmware will log additional details in a separate IML entry if possibleCaution POST Message 03/13/2013 16:43 03/13/2013 16:43 1 POST Error: 1792-Slot X Drive Array - Valid Data Found in Ilo Watchdog Nmi But you can solve doing this: the modules what produces this is hpwdt. We are an HP shop so I have plenty of brand new boxed 380 shells sitting in the warehouse I can test with. Read more...
An Unrecoverable System Error Nmi Has Occurred Dl585
The issue occurs most often when we use live migration. https://bugs.launchpad.net/bugs/1432837 If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed. An Unrecoverable System Error Nmi Has Occurred Hp Bad motherboard. An Unrecoverable System Error (nmi) Has Occurred Proliant I have taken it out now, and up to now, there was no further issue.I have to monitor it for a while to see if that actually stopped it.
Per kernel team comments (on kernel-team mailing list): """ We have been seeing random crashs from various HP systems, this has been tracked to loading of the hpwdt watchdog modules. this contact form Do you have all the HP management agents installed and running? The only errors that i found was the above in OnBoard Administrator --> IML Log System error ---> An Unrecoverable System Error (NMI) has occurred (System error code 0x0000002B, 0x00000000) ASR If you have any questions, please contact customer service. An Unrecoverable System Error Has Occurred Error Code 0x0000002d 0x00000000
Did you find a workaround ? #12 pipomambo, Nov 11, 2015 adamb Member Proxmox VE Subscriber Joined: Mar 1, 2012 Messages: 777 Likes Received: 3 pipomambo said: ↑ Hello, We ILO: "76 CriticalSystem Error03/12/2015 12:4203/12/2015 12:072 An Unrecoverable System Error (NMI) has occurred (System error code 0x0000002B, 0x00000000)" Examples: PID: 0 TASK: ffffffff81c1a480 CPU: 0 COMMAND: "swapper/0" #0 [ffff88085fc05c88] machine_kexec at The time now is 05:34 AM. http://dis-lb.net/an-unrecoverable/an-unrecoverable-system-error-has-occurred-hp.php t.lamprecht said: ↑ After a bit of investigating I found some bug report regarding your machines, e.g.: https://bugzilla.redhat.com/show_bug.cgi?id=438741 (very old bug, but still) Because your firmware is up to date it
In either case (solved or not afterwards) I would advise you to contact the support so that the part(s) can be replaced afterwards.If you do so, make sure you provide an
Blacklisting the watchdog timer just hides underlying problems. With the module hpwdt loaded, a kernel panic happens randomly. It seems like if corosync wants to use them, which is why it would open /dev/watchdog, then there's either a corosync bug or there's something in the configuration that isn't right. Uncorrectable Pci Express Error This probably falls on HP first.
For cluster configurations, you probably really do want a watchdog so that hung systems can crash, reboot and rejoin the cluster. Data will automatically be written to drive array.Caution POST Message 03/13/2013 16:43 03/13/2013 16:43 1 POST Error: 1719 - A controller failure event occurred prior to this power-upCritical PCI Bus 03/13/2013 Thank you Rafael Tinoco -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. Check This Out They are both HP DL380 Gen9's.
this is my first post on forum.proxmox. https://bugs.launchpad.net/bugs/1432840 Title: The update process become buggy with many enabled repositories To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+bug/1432840/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs Previous Message by Thread: [Bug Without the module the server reboot. i tested this on HP proliant Servers, ILO+Watchdog on linux produces kernel panic,when you use HA on proxmox.
Soadyheid View Public Profile View LQ Blog View Review Entries View HCL Entries Find More Posts by Soadyheid 09-22-2014, 04:48 AM #7 kaito.7 LQ Newbie Registered: Jun 2014 Posts: will update the case soon. Tom, can you dig a little deeper into that? We still cannot resolve the issue and it occurs with DL380p Gen 8 8-core and 12-core models.The HBA on the riser card fails.
OA Syslog 3. kaito.7 View Public Profile View LQ Blog View Review Entries View HCL Entries Find More Posts by kaito.7 06-03-2014, 09:59 AM #6 Soadyheid Senior Member Registered: Aug 2010 Location: This will help the support colleagues and figure out what went wrong. Did you find a workaround ?Click to expand...
Edit: Have you seen this thread? This Issue is not a Proxmox VE one. #4 t.lamprecht, Oct 21, 2015 mensinck New Member Joined: Oct 19, 2015 Messages: 4 Likes Received: 0 Hi t.lamprecht t.lamprecht said: ↑ We have a cluster on Proxmox V4.0-48 with two Dell R900 and one HP DL380 G9. This can lead to a NMI being wrongly handled (like if the PMU register was overflowed, without being) and a kernel panic.
Thought it was only related to 3.x kernels.