Server becomes unresponsive and requires hard reboot
Posted: Mon Jun 02, 2014 9:48 am
Hi,
My server has been acting up for the last few months. Sometimes it will stop working 2-3 times in a day, other times it is fine for about a week. Once it is unresponsive, I have to hold the power button down to turn it off and restart. I cannot access it via ssh or any web apps.
When I look at /var/log/messages I see dumps similar to this a lot:
uname -a :
Linux localhost.localdomain 3.14.4-100.fc19.x86_64 #1 SMP Tue May 13 15:00:26 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
I have the latest updates via yum update and am using greyhole for my storage pool.
It seems to be an issue with smbd (or maybe file system related?). I am no Linux expert. Where should I go from here?
Thanks,
Matthew
My server has been acting up for the last few months. Sometimes it will stop working 2-3 times in a day, other times it is fine for about a week. Once it is unresponsive, I have to hold the power button down to turn it off and restart. I cannot access it via ssh or any web apps.
When I look at /var/log/messages I see dumps similar to this a lot:
Code: Select all
Jun 2 05:27:15 localhost kernel: [59653.018265] general protection fault: 0000 [#4791] SMP
Jun 2 05:27:15 localhost kernel: [59653.018273] Modules linked in: arc4 md4 nls_utf8 tun cifs dns_resolver fscache nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm coretemp kvm_intel kvm r8169 iTCO_wdt iTCO_vendor_support mii microcode ppdev i2c_i801 i2c_core gpio_ich serio_raw snd_page_alloc snd_timer snd soundcore parport_pc parport shpchp lpc_ich mfd_core acpi_cpufreq ata_generic pata_acpi pata_jmicron
Jun 2 05:27:15 localhost kernel: [59653.018307] CPU: 1 PID: 12185 Comm: smbd Tainted: G B D 3.13.11-100.fc19.x86_64 #1
Jun 2 05:27:15 localhost kernel: [59653.018311] Hardware name: Gigabyte Technology Co., Ltd. EP45T-UD3LR/EP45T-UD3LR, BIOS F12e 10/14/2011
Jun 2 05:27:15 localhost kernel: [59653.018315] task: ffff88011f34c500 ti: ffff88000fbb6000 task.ti: ffff88000fbb6000
Jun 2 05:27:15 localhost kernel: [59653.018318] RIP: 0010:[<ffffffff8114a542>] [<ffffffff8114a542>] find_get_page+0x42/0xc0
Jun 2 05:27:15 localhost kernel: [59653.018326] RSP: 0018:ffff88000fbb7b78 EFLAGS: 00010246
Jun 2 05:27:15 localhost kernel: [59653.018328] RAX: 0000000080000000 RBX: ffff88000faa2bf8 RCX: 00000000fffffffa
Jun 2 05:27:15 localhost kernel: [59653.018331] RDX: 0400000000000000 RSI: ffff8800a71eb518 RDI: 0000000000000000
Jun 2 05:27:15 localhost kernel: [59653.018334] RBP: ffff88000fbb7b88 R08: 0400000000000000 R09: ffff8800a71eb308
Jun 2 05:27:15 localhost kernel: [59653.018336] R10: 0000000000000041 R11: ffffea0007eef3c0 R12: 000000000006e67f
Jun 2 05:27:15 localhost kernel: [59653.018339] R13: ffff88000faa2bf0 R14: 000000000006e67f R15: 00000000000000d0
Jun 2 05:27:15 localhost kernel: [59653.018342] FS: 00007f99d51fd840(0000) GS:ffff880207c80000(0000) knlGS:0000000000000000
Jun 2 05:27:15 localhost kernel: [59653.018345] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 2 05:27:15 localhost kernel: [59653.018348] CR2: 00007f0506d5eb20 CR3: 00000000abd71000 CR4: 00000000000007e0
Jun 2 05:27:15 localhost kernel: [59653.018351] Stack:
Jun 2 05:27:15 localhost kernel: [59653.018352] ffff88011e3b7900 000000000006e67f ffff88000fbb7bb0 ffffffff8114a77f
Jun 2 05:27:15 localhost kernel: [59653.018357] ffff88011e3b7900 ffff88000faa2bf0 00000000010200da ffff88000fbb7bf0
Jun 2 05:27:15 localhost kernel: [59653.018362] ffffffff8114b12f 000000006e67f000 ffff88011e3b7900 ffff88000faa2aa0
Jun 2 05:27:15 localhost kernel: [59653.018366] Call Trace:
Jun 2 05:27:15 localhost kernel: [59653.018370] [<ffffffff8114a77f>] find_lock_page+0x1f/0x70
Jun 2 05:27:15 localhost kernel: [59653.018374] [<ffffffff8114b12f>] grab_cache_page_write_begin+0x5f/0xd0
Jun 2 05:27:15 localhost kernel: [59653.018378] [<ffffffff812433e4>] ext4_da_write_begin+0x94/0x2e0
Jun 2 05:27:15 localhost kernel: [59653.018382] [<ffffffff81243eca>] ? ext4_da_write_end+0xba/0x250
Jun 2 05:27:15 localhost kernel: [59653.018385] [<ffffffff8114a3a8>] generic_file_buffered_write+0xf8/0x250
Jun 2 05:27:15 localhost kernel: [59653.018389] [<ffffffff8114bb51>] __generic_file_aio_write+0x1c1/0x3d0
Jun 2 05:27:15 localhost kernel: [59653.018392] [<ffffffff8114bdb8>] generic_file_aio_write+0x58/0xa0
Jun 2 05:27:15 localhost kernel: [59653.018397] [<ffffffff81239519>] ext4_file_write+0x99/0x400
Jun 2 05:27:15 localhost kernel: [59653.018401] [<ffffffff812061a4>] ? posix_test_lock+0x24/0xf0
Jun 2 05:27:15 localhost kernel: [59653.018404] [<ffffffff8120629d>] ? vfs_test_lock+0x2d/0x40
Jun 2 05:27:15 localhost kernel: [59653.018408] [<ffffffff81207d81>] ? fcntl_getlk+0xf1/0x110
Jun 2 05:27:15 localhost kernel: [59653.018412] [<ffffffff811b7c9a>] do_sync_write+0x5a/0x90
Jun 2 05:27:15 localhost kernel: [59653.018416] [<ffffffff811b83f4>] vfs_write+0xb4/0x1f0
Jun 2 05:27:15 localhost kernel: [59653.018419] [<ffffffff811b8fa2>] SyS_pwrite64+0x72/0xb0
Jun 2 05:27:15 localhost kernel: [59653.018423] [<ffffffff81690729>] system_call_fastpath+0x16/0x1b
Jun 2 05:27:15 localhost kernel: [59653.018426] Code: 89 df e8 52 dd 1c 00 48 85 c0 48 89 c6 74 52 48 8b 10 48 85 d2 74 3d f6 c2 03 75 6b 65 8b 04 25 a0 c7 00 00 a9 00 ff 1f 00 75 57 <8b> 4a 1c 85 c9 74 ca 8d 79 01 4c 8d 4a 1c 89 c8 f0 0f b1 7a 1c
Jun 2 05:27:15 localhost kernel: [59653.018455] RIP [<ffffffff8114a542>] find_get_page+0x42/0xc0
Jun 2 05:27:15 localhost kernel: [59653.018459] RSP <ffff88000fbb7b78>
Jun 2 05:27:15 localhost kernel: [59653.018463] ---[ end trace 9bd77a6a70cdff38 ]---
Linux localhost.localdomain 3.14.4-100.fc19.x86_64 #1 SMP Tue May 13 15:00:26 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
I have the latest updates via yum update and am using greyhole for my storage pool.
It seems to be an issue with smbd (or maybe file system related?). I am no Linux expert. Where should I go from here?
Thanks,
Matthew