Page 1 of 1

Frustrating networking problem

Posted: Fri Nov 04, 2011 12:54 am
by doogie
Recently I've been getting intermittent serious networking problems with my HDA.
I'm normally alerted to it because my DNS stops working, so browsing from my phone in bed doesn't work - I'm pretty sure it's always been first thing in the morning. These happen once every week or two weeks. I don't remember them happening before I installed my IBM M1015 Raid card a couple of months back.
Normally when this happens my HDA is unpingable and the only way to restore net access is to do one of the following:-
  • Set my laptop to use the router for DNS (gives me internet but can't access HDA)
  • Reset the router (this worked once, but it's a Billion BiPAC 7800N that I've not had many problems with and it's known for it's stability)
  • Power cycle the HDA (always works but I'm not keen to do this)
My HDA is normally headless but I have connected a monitor to it and run the network troubleshooting which is posted at pastebin - it fails at step 4.
I have restarted named, but that didn't make any difference.

I'm still running the F12 version of Amahi - I'm not planning on upgrading to F14 until this is either pretty stable or someone says it's an obvious F12 problem.

Any help much appreciated!

Re: Frustrating networking problem

Posted: Fri Nov 04, 2011 1:15 am
by bgrablin
What is the output of the following:

Code: Select all

cat /var/log/secure

Code: Select all

cat /var/log/messages
(Only need the last few lines prior to the point of failure)

Re: Frustrating networking problem

Posted: Fri Nov 04, 2011 1:18 am
by moredruid
what happens if you run "service network restart" as root?

BTW: can you post the output of "dmesg" as well?

Re: Frustrating networking problem

Posted: Fri Nov 04, 2011 6:52 am
by doogie
Strangely enough, the ssh I had open from the office is still open, although I can't open a new one :?

/var/log/secure - nothing interesting I don't think, my username changed

Code: Select all

Nov 3 20:37:18 localhost sshd[12150]: Accepted publickey for myusername from 192.168.0.101 port 50185 ssh2 Nov 3 20:37:18 localhost sshd[12148]: Accepted publickey for myusername from 192.168.0.101 port 54499 ssh2 Nov 3 20:37:18 localhost sshd[12148]: pam_unix(sshd:session): session opened for user myusername by (uid=0) Nov 3 20:37:18 localhost sshd[12150]: pam_unix(sshd:session): session opened for user myusername by (uid=0) Nov 3 20:37:42 localhost sshd[12176]: Received disconnect from 192.168.0.101: 11: Closed due to user request. Nov 3 20:37:42 localhost sshd[12150]: pam_unix(sshd:session): session closed for user myusername Nov 3 22:36:57 localhost sshd[12148]: pam_unix(sshd:session): session closed for user myusername Nov 4 07:09:56 localhost pam: gdm-password[4376]: pam_unix(gdm-password:auth): authentication failure; logname= uid=0 euid=0 tty=:0 ruser= rhost= user=myusername Nov 4 07:10:12 localhost pam: gdm-password[6620]: pam_unix(gdm-password:session): session opened for user myusername by (uid=0) Nov 4 07:21:19 localhost su: pam_unix(su-l:session): session opened for user root by myusername(uid=500) Nov 4 09:51:43 localhost sudo: myusername : TTY=pts/0 ; PWD=/home/myusername ; USER=root ; COMMAND=/bin/su -
end of /var/log/messages (I've used the fact that transmission stopped reporting some things around 3:30 to assume it happened then) - some transmission things fudged for my protection and mac addresses and pc names too, :)
http://pastebin.com/HM8PJ4Rq

dmesg output
http://pastebin.com/PS2adk7Q

service network restart

Code: Select all

[root@server ~]# service network restart Shutting down interface eth0: [ OK ] Shutting down loopback interface: [ OK ] Disabling IPv4 packet forwarding: net.ipv4.ip_forward = 0 [ OK ] Bringing up loopback interface: [ OK ] Bringing up interface eth0: [ OK ]
- done from the existing ssh connection, changed nothing - I expected it to drop the connection just after I hit enter.

Thanks guys

Re: Frustrating networking problem

Posted: Tue Nov 08, 2011 3:49 am
by moredruid
hmm it seems there are a hung kernel tasks. These may have to do with flush buffer actions to disk.

you might want to run (as root): smartcl -t short /dev/sd<?>
where <?> is each disk you have (ls /dev/sd* will give you the complete list).

This will run a short SMART test (replacing "short" with "long" does what you would expect :geek:)
smartctl -a /dev/sda will display the current statistics for the first disk.

Re: Frustrating networking problem

Posted: Tue Nov 08, 2011 5:57 am
by doogie
hmm it seems there are a hung kernel tasks. These may have to do with flush buffer actions to disk.

you might want to run (as root): smartcl -t short /dev/sd<?>
where <?> is each disk you have (ls /dev/sd* will give you the complete list).

This will run a short SMART test (replacing "short" with "long" does what you would expect :geek:)
smartctl -a /dev/sda will display the current statistics for the first disk.
Thanks.

Nothing obvious showing up (to my eye at least) - the first 6 drives all report passes, the 7th & 8th are on the IBM RAID card that doesn't pass SMART statuses unfortunately - might have to look and see if there's a fix for that or way of enabling it.