Page 1 of 1

Random server freeze

Posted: Sat Jul 24, 2010 1:29 pm
by stueyboy
Looking for some help.

I am having some issues with my system randomly crashing and I thought this was due to a bad disk sector. I have deleted the partitions and reformatted followed by a complete system rebuild and I am still getting some sort of random crash.

Anyone have any clues where to start diagnosing what is going on. I suspect a hard disk hardware error but if there is some new update which might be causing this it would be good to rule that out first.

Ta

Re: Random server freeze

Posted: Sat Jul 24, 2010 3:27 pm
by gboudreau
When it freezes next time, note down the exact time, and reboot, then immediately look in /var/log/messages
Check for any messages found before your reboot.

Re: Random server freeze

Posted: Sun Jul 25, 2010 1:25 am
by stueyboy
Thanks for that. I'll have a look next time

Re: Random server freeze

Posted: Sun Jul 25, 2010 4:48 am
by moredruid
dmesg is the hardware log (it's a command), you might see some interesting stuff there (memory errors, disk sector errors) even now.

smartctl -a /dev/<suspect harddisk> might give you a clue too if it has errors, you can also run tests with that command.

usually it's corrupt memory.

most distro disks incorporate a memtest, you might want to run that if you're HDD is OK

Re: Random server freeze

Posted: Sun Jul 25, 2010 11:48 am
by stueyboy
Thanks guys for the help. At the moment, all is well with the new install. I have stopped any updates at all from the Fedora system but I think I might have had a corrupt partition. My first attempt at a clean reinstall over the top of the old one failed, so I booted using an Ubuntu live disk and wiped the partition tables and created new ones then installed again and all going OK so far.

The old install was giving me all sorts of disk errors on the F2 setup screens. We did get the electricity switched off while I was at work a few weeks ago and I forgot to switch the system off during that time so I suspect that might have contributed to the unstable system.

Re: Random server freeze

Posted: Mon Aug 09, 2010 1:01 am
by stueyboy
Bit of an update.....


...Instabilities in the system continued after my rebuild to such an extent that yesterday upon a freeze and reboot, the BIOS didn't recognise the disks at all. Swapped the HD cable out and it all seems to be much better now. Gave me a chance to set up greyhole as well which seems to be working nicely.

Re: Random server freeze

Posted: Wed Aug 11, 2010 11:23 am
by oldcyberdude
I also have been getting random freezes. First evidence is usually DNS fails on my network systems.

I recently totally changed my hardware (except for the RAID disks containing hda) thinking it was in the old system hardware but my new system is showing the same symtomsdmesg. Below are last entires from messages prior to freeze. All previous messages (all morning) are quite similar i.e. DHCP At 13:43 I start seeing my reboot messages.

Aug 11 13:23:54 tigger nmbd[2146]: [2010/08/11 13:23:54, 0] nmbd/nmbd_browsesync.c:350(find_domain_master_name_query_fail)
Aug 11 13:23:54 tigger nmbd[2146]: find_domain_master_name_query_fail:
Aug 11 13:23:54 tigger nmbd[2146]: Unable to find the Domain Master Browser name HOME<1b> for the workgroup HOME.
Aug 11 13:23:54 tigger nmbd[2146]: Unable to sync browse lists in this workgroup.
Aug 11 13:29:50 tigger dhcpd: Wrote 0 deleted host decls to leases file.
Aug 11 13:29:50 tigger dhcpd: Wrote 0 new dynamic host decls to leases file.
Aug 11 13:29:50 tigger dhcpd: Wrote 8 leases to leases file.
Aug 11 13:29:50 tigger dhcpd: DHCPREQUEST for 192.168.1.103 from 00:1f:bc:03:62:05 (kanga) via eth0
Aug 11 13:29:50 tigger dhcpd: DHCPACK on 192.168.1.103 to 00:1f:bc:03:62:05 (kanga) via eth0

dmesg doesn't seem to report anything unusual

nothing from smartctl

I really don't think memtest would show anything since memory was changed with the system.

Tough one to trouble shoot since the system runs for hours and sometimes days

Re: Random server freeze

Posted: Thu Aug 12, 2010 9:36 pm
by gjc1000
I had a problem like that once. I had just built a system with all new parts and could not figure out why it was freezing randomly, figuring it has to be a software issue, because in the back of my mind, I kept saying to myself, it's new hardware, that can't be what's wrong, alas, as I was troubleshooting the hardware side, I run a memtest, and sure enough, I had a bad memory stick. Swapped the bad stick of memory with a new one, and viola, everything was as it should be and continues to be.
Try memtest, just for giggles : )