Out of inodes on system partition.. greyhole spooled files..

jbmia
Posts: 67
Joined: Sun Nov 07, 2010 11:59 am

Out of inodes on system partition.. greyhole spooled files..

Postby jbmia » Wed Jul 20, 2011 7:20 am

This follows up another thread I posted about a week or so back regarding greyhole.log spinning up too large such that I would run of space on my 20g system partition in just hours.. the greyhole.log file would get up to 8g inside an hour.. I'd delete it and it would rebuild...

This occurred after I move a large amount of files from a backed up system drive from another machine into my pool and also installed a new 2Tb drive into the system. The system is composed of 9 total drives and about 8Tb.

I solve the greyhole.log problem by created a cronjob to rotate the log initially every 15 minutes and then tuned it down to every half hour. This seemed to solve the problem. I now have plenty of free gb on the partition.

I'm now experiencing other issues. The first symptoms was the inability to access shares... the next symptoms was the inability to bring up the hda page...

Code: Select all

The application has found an error. You will be forwarded shortly.

I checked my /var/log/greyhole.log and there was no log... service was running.. I restarted it.. not solved.. I rebooted and the greyhole.log log started generating... but the issues persisted...

I tried:

Code: Select all

greyhole --stats Can't describe tasks with query: DESCRIBE tasks - Error: Can't create/write to file '/tmp/#sql_7e7_0.MYI' (Errcode: 28)[root@9745 greyh
but get that error...

Also.. tail /var/log/messages:

Code: Select all

Jul 20 11:32:13 9745 openvpn[1725]: 76.110.5.180:59271 TLS Error: TLS handshake failed Jul 20 11:32:13 9745 openvpn[1725]: 76.110.5.180:59271 SIGUSR1[soft,tls-error] received, client-instance restarting Jul 20 11:32:30 9745 nmbd[2358]: [2011/07/20 11:32:30.480368, 0] nmbd/nmbd_serverlistdb.c:343(write_browse_list) Jul 20 11:32:30 9745 nmbd[2358]: write_browse_list: Can't open file /var/lib/samba/browse.dat.. Error was No space left on device Jul 20 11:32:40 9745 nmbd[2358]: [2011/07/20 11:32:40.506167, 0] nmbd/nmbd_serverlistdb.c:343(write_browse_list) Jul 20 11:32:40 9745 nmbd[2358]: write_browse_list: Can't open file /var/lib/samba/browse.dat.. Error was No space left on device Jul 20 11:32:40 9745 nmbd[4763]: [2011/07/20 11:32:40.556270, 0] nmbd/nmbd_winsserver.c:2380(wins_write_database) Jul 20 11:32:40 9745 nmbd[4763]: wins_write_database: Can't open /var/lib/samba/wins.dat.4763. Error was No space left on device Jul 20 11:32:40 9745 nmbd[2358]: [2011/07/20 11:32:40.639944, 0] nmbd/nmbd_serverlistdb.c:343(write_browse_list) Jul 20 11:32:40 9745 nmbd[2358]: write_browse_list: Can't open file /var/lib/samba/browse.dat.. Error was No space left on device
I did some googling and found the issue might be related to running out of inodes:

Code: Select all

df -i Filesystem Inodes IUsed IFree IUse% Mounted on /dev/mapper/vg_9745-lv_root 1267280 1267280 0 100% / tmpfs 474863 1 474862 1% /dev/shm /dev/sdi1 128016 41 127975 1% /boot /dev/mapper/vg_9745-lv_home 642112 2895 639217 1% /home /dev/sdi3 21667840 415700 21252140 2% /var/hda/files /dev/sdb1 61054976 380063 60674913 1% /var/hda/files/drives/drive1 /dev/sdc1 61054976 416381 60638595 1% /var/hda/files/drives/drive2 /dev/sdd1 91578368 508920 91069448 1% /var/hda/files/drives/drive3 /dev/sde1 122101760 643108 121458652 1% /var/hda/files/drives/drive4 /dev/sdg1 18317312 130885 18186427 1% /var/hda/files/drives/drive5 /dev/sdh1 61054976 322227 60732749 1% /var/hda/files/drives/drive6 /dev/sdi4 6406144 34 6406110 1% /var/media /dev/sdf1 122101760 111822 121989938 1% /var/hda/files/drives/drive9 //127.0.0.1/Books 0 0 0 - /mnt/samba/Books //127.0.0.1/Home_Movies 0 0 0 - /mnt/samba/Home_Movies //127.0.0.1/Movies 0 0 0 - /mnt/samba/Movies //127.0.0.1/Music 0 0 0 - /mnt/samba/Music //127.0.0.1/Pictures 0 0 0 - /mnt/samba/Pictures //127.0.0.1/Public 0 0 0 - /mnt/samba/Public //127.0.0.1/Recipes 0 0 0 - /mnt/samba/Recipes //127.0.0.1/Software 0 0 0 - /mnt/samba/Software //127.0.0.1/TV 0 0 0 - /mnt/samba/TV //127.0.0.1/Users 0 0 0 - /mnt/samba/Users
So.. you can see I"m out of inodes on the system partition... I strongly suspect it is the greyhole spool file, but I'm not 100% sure...

Code: Select all

ls -1 /var/spool/greyhole | wc -l 1013816
Question: Can I kill those spooled files, delete unnecessary files in my shares I know I don't need that were copied over recently (e.g., /documents and settings/application data/"gobs and gobs of crap in those directories I don't need that got copied over from decommissioned hard drive"...).. and then re-run an fsck to let the spool rebuild??

Or will I blow things up really good if I do that??

Knowledgeable input would be immensely appreciated!

jbmia

User avatar
lrevxl
Posts: 82
Joined: Fri Mar 04, 2011 7:23 pm
Location: Chicago, IL, USA
Contact:

Re: Out of inodes on system partition.. greyhole spooled fil

Postby lrevxl » Wed Jul 20, 2011 7:58 pm

You could kill the spooled files, but there are a few caveats to that. Greyhole will lose any knowledge of file operations that were performed and are represented in those spool files. (i.e. any renames, moves, deletes, writes, etc) Most of those don't carry much risk, if you moved or renamed something those changes will just be undone upon the next fsck since Greyhole will not have transferred those operations to the pool drives. If, however, you've got any shares with 2x (or more) copies and there are write operations in the spool, your files could get into an inconsistent state. (i.e. one copy is updated and the others are not.)

I am curious, though, how it managed to create such a prolific number of spool files without consuming them. Do you have a large number of pending tasks? Are the tasks actually being processed?

jbmia
Posts: 67
Joined: Sun Nov 07, 2010 11:59 am

Re: Out of inodes on system partition.. greyhole spooled fil

Postby jbmia » Thu Jul 21, 2011 1:19 am

Thanks for the reply! I'm going to go ahead and delete the files...

You know, I thought that the spool files were the pending tasks... so, I'm not sure the answer there. Am I misunderstanding the nature of the spooled files??

But, to try to answer your question as to whether greyhole is consuming the spooled tasks (which again is what I assume they are) I'm not sure 100% either way... On the one hand, per below, the spooled count is not descreasing... its actually increased by a few files since my intial post:

Code: Select all

ls -1 /var/spool/greyhole | wc -l 1013820
The box is headless.. and is running in text mode (no X) in order to save memory and so on... with that, I can see the system just sitting there flying through messages in the terminal window.. many of which start with a greyhole related message (e.g., ... found a tombstone..., etc), but I'm not sure what it's actually doing cause they're flying so quickly across the screen.

Again, /var/log/messages:

Code: Select all

Jul 21 04:04:25 9745 nmbd[31295]: wins_write_database: Can't open /var/lib/samba/wins.dat.31295. Error was No space left on device Jul 21 04:04:25 9745 nmbd[2358]: [2011/07/21 04:04:25.732977, 0] nmbd/nmbd_serverlistdb.c:343(write_browse_list) Jul 21 04:04:25 9745 nmbd[2358]: write_browse_list: Can't open file /var/lib/samba/browse.dat.. Error was No space left on device Jul 21 04:04:45 9745 nmbd[2358]: [2011/07/21 04:04:45.779897, 0] nmbd/nmbd_serverlistdb.c:343(write_browse_list) Jul 21 04:04:45 9745 nmbd[2358]: write_browse_list: Can't open file /var/lib/samba/browse.dat.. Error was No space left on device Jul 21 04:04:45 9745 nmbd[31724]: [2011/07/21 04:04:45.856397, 0] nmbd/nmbd_winsserver.c:2380(wins_write_database) Jul 21 04:04:45 9745 nmbd[31724]: wins_write_database: Can't open /var/lib/samba/wins.dat.31724. Error was No space left on device Jul 21 04:04:45 9745 nmbd[2358]: [2011/07/21 04:04:45.914902, 0] nmbd/nmbd_serverlistdb.c:343(write_browse_list) Jul 21 04:04:45 9745 nmbd[2358]: write_browse_list: Can't open file /var/lib/samba/browse.dat.. Error was No space left on device
So, if it needed some tmp file space on the system partition, to process those spooled tasks, I would imagine they'd be erroring out due to "no space left on device"...

Any other way to tell if it's consuming the spool?

The greyhole.log,, logrotate, and compress process seems to be barfing on the lack of space as well.. if you let it run, it'll create a greyhole.log file, but once the half hourly logrotate process is invoked, it throws up on the compress and save portions cause of the disk space I think:

Code: Select all

# du -sh grey* 2.6G greyhole.log.1 120M greyhole.log.2.gz 24M greyhole.log.6.gz 112M greyhole.log.8.gz [root@9745 log]# \rm greyhole.log.1 [root@9745 log]# du -sh grey* 9.6M greyhole.log 120M greyhole.log.2.gz 24M greyhole.log.6.gz 112M greyhole.log.8.gz
So, once that file is deleted, greyhole.log is restarted since we have some space again, and:

Code: Select all

tail /var/log/greyhole.log Jul 21 04:08:27 7 rename: Found a broken symlink to update: /var/hda/files/Users/John/My Documents from IBS laptop/Everything Else/IBS BI 6.0 Image (inkl. Planning)/cognos login.txt. Old (broken) target: /var/hda/files/drives/drive6/gh/Users/John/Preload (E)/Documents and Settings/usjohbek/My Documents/Everything Else/IBS BI 6.0 Image (inkl. Planning)/cognos login.txt; new (fixed) target: /var/hda/files/drives/drive6/gh/Users/John/Preload (E)/Documents and Settings/usjohbek/My Documents/Everything Else/IBS BI 6.0 Image (inkl. Planning)/cognos login.txt Jul 21 04:08:27 7 rename: Found a broken symlink to update: /var/hda/files/Users/John/My Documents from IBS laptop/Everything Else/IBS BI 6.0 Image (inkl. Planning)/sesmvs260.vmxf. Old (broken) target: /var/hda/files/drives/drive6/gh/Users/John/Preload (E)/Documents and Settings/usjohbek/My Documents/Everything Else/IBS BI 6.0 Image (inkl. Planning)/sesmvs260.vmxf; new (fixed) target: /var/hda/files/drives/drive6/gh/Users/John/Preload (E)/Documents and Settings/usjohbek/My Documents/Everything Else/IBS BI 6.0 Image (inkl. Planning)/sesmvs260.vmxf
So, if greyhole couldn't write to it's log, what would it do? Continue processing spooled files? or other tasks? Or, would it stop actually consuming tasks? Does greyhole need tmp file space on the system partition that would make it throw up that partitions maxed out inode state? Interesting questions I don't have the answer for.... Let me know what you think...

Thanks again for your assistance! Much appreciation!

jbmia

User avatar
lrevxl
Posts: 82
Joined: Fri Mar 04, 2011 7:23 pm
Location: Chicago, IL, USA
Contact:

Re: Out of inodes on system partition.. greyhole spooled fil

Postby lrevxl » Thu Jul 21, 2011 7:53 am

You know, I thought that the spool files were the pending tasks... so, I'm not sure the answer there. Am I misunderstanding the nature of the spooled files??
There's a difference between pending tasks and the files in the spool directory. Once Greyhole reads a spool file and loads that task into the database it deletes the spool file. So what you're seeing in that directory are tasks Greyhole hasn't even gotten to yet. In general Greyhole keeps about a thousand tasks loaded in the database at any given time, so if your spool files aren't getting consumed, it sounds like tasks aren't being processed.
Again, /var/log/messages:

Code: Select all

Jul 21 04:04:25 9745 nmbd[31295]: wins_write_database: Can't open /var/lib/samba/wins.dat.31295. Error was No space left on device[/quote] I'm assuming the above is the VFS plugin for Samba that creates the spool files for Greyhole crapping out since you're out of space (or inodes). [quote]So, if greyhole couldn't write to it's log, what would it do? Continue processing spooled files? or other tasks? Or, would it stop actually consuming tasks? Does greyhole need tmp file space on the system partition that would make it throw up that partitions maxed out inode state? Interesting questions I don't have the answer for.... Let me know what you think...[/quote] I think the better question is whether Greyhole would crash if it can't write to its logs. I'm not sure if it will, I haven't run that scenario before. You can verify when/if you hit this again, though. As root you can run `service greyhole status` if it's not running, then obviously it won't be spooling / processing tasks. If it continues running when logs cannot be written to and your landing zone is on the full partition I see further complications arising. More to the point, you may just want to remove all the tombstones and files manually for the files you want to get rid of since it seems like you're putting a heavy load on Greyhole. You'd have to remove the folder from both the graveyards and the share paths on the pool drives. You could also remove them from the landing zone but an fsck will remove the symlinks as well once it sees there's no corresponding tombstones. So for each pool path defined in your /etc/greyhole.conf: `rm -r /path/to/pool/drive/gh/share_name/path_to_remove/*` `rm -r /path/to/pool/drive/gh/.gh_graveyard/share_name/path_to_remove/*` `rm -r /path/to/pool/drive/gh/.gh_graveyard_backup/share_name/path_to_remove/*` The -r above will recursively delete [i]everything[/i] under the path you specify. Some explanation of the paths above: /path/to/pool/drive would be something like /var/hda/files/drives/drive1 share_name would be something like Users path_to_remove would be something like John/Preload (E)/Documents and Settings [b]Note:[/b] be careful if you choose to run the above that you only remove the data you want to delete, obviously going up a directory level could result in deleting a lot more data than you intended!

jbmia
Posts: 67
Joined: Sun Nov 07, 2010 11:59 am

Re: Out of inodes on system partition.. greyhole spooled fil

Postby jbmia » Thu Jul 21, 2011 10:49 am

I hear you on the finer points of rm... and -r... and oh by the way... how about \rm -r.. that one'll bite you if you're not judicious in it's usage... Thanks for all your insight!

In reply to a couple of your points:
I think the better question is whether Greyhole would crash if it can't write to its logs. I'm not sure if it will, I haven't run that scenario before. You can verify when/if you hit this again, though. As root you can run `service greyhole status` if it's not running, then obviously it won't be spooling / processing tasks. If it continues running when logs cannot be written to and your landing zone is on the full partition I see further complications arising.
I had run a 'service greyhole status" and it showed as running throughout the time this issue occurred...
Also... landing zone is on another partition... already learned that lesson.

I've already gone in and deleted some of the /var/hda/files/drives/drivex/gh/Users/UserX/OldStuff/Idon't want files... Based on your suggestion, I need to go back and hit the .gh_graveyard stuff as well.... Once I've completed that, I"m not going to do anything till I see greyhole has no more tasks... then I'll go about deleting more stuff... I think part of the problem was, I went into the share and started cleaning stuff up when greyhole was in the middle of trying to do it's thing for the initial transfer of the data... so moving things around and deleting things before it even had a chance to distribute the data across the pool...

By the way, my suggestion to the powers that be, would be to insert a blurb wherever there are faq/instructions/wiki whatever on drive partitioning... to consider system partition size with regard to situations like these... I'm not sure what the suggested size should be, but I'll never partition an Amahi system partition this small again.. I think I'd go with 100g just to be safe.. (disk is cheap) and I've already got /home and /var/hda/files/ and /media/TV (SageTV) on separate partitions... I know I'm a power user I'm pushing greyhole with the volume of data, but I just didn't see this one coming...

Regards,

jbmia

Who is online

Users browsing this forum: No registered users and 5 guests