Page 1 of 2
Clear stuck queue?/Remove all tombstones and re-fsck
Posted: Sun Jul 24, 2011 10:43 am
by AndyNJ
I've been stuck with the below state for days. I've restarted the system and Greyhole and canceled and restarted the fsck task multiple times, but there are still files that don't get out of the queue. I'm also missing a lot of tombstones that aren't being regenerated.
Code: Select all
Greyhole Work Queue Statistics
==============================
This table gives you the number of pending operations queued for the Greyhole daemon, per share.
Write Delete Rename Repair (fsck)
Archive 0 0 0 0
Backup 0 0 0 0
Books 0 0 0 0
Docs 0 0 0 0
Music 10 0 0 0
Photos 5 1 2 0
Software 0 0 0 0
TimeMachine 882 0 0 0
TimeMachine_DH 7448 3261 51 0
Videos 228 0 0 0
VirtualMachines 426 13 29 0
===============
Total 8999 3275 82 0
The following is the number of pending operations that the Greyhole daemon still needs to parse.
Until it does, the nature of those operations is unknown.
Spooled operations that have been parsed will be listed above and disappear from the count below.
Spooled 65949
Is there an easy way to either completely clear this queue and have it start over or just remove ALL tombstones in one shot and have Greyhole rebuild them all from scratch? I guess I actually need to do both.
Re: Clear stuck queue?/Remove all tombstones and re-fsck
Posted: Mon Jul 25, 2011 6:20 pm
by lrevxl
It's possible to remove all tombstones and regenerate them, but I don't precisely see how that's going to fix your issues. What's going on in your Greyhole logs? Do you see file operations going through? What do you mean tombstones are 'missing'? Are you seeing errors in the logs? Did you delete the tombstones?
Re: Clear stuck queue?/Remove all tombstones and re-fsck
Posted: Mon Jul 25, 2011 6:31 pm
by AndyNJ
I haven't done anything yet.
I've got files (about 100 of them) that appear in the graveyard on my pool disks, but I can't actually access the files from the shares (they don't show up there).
My queue currently looks pretty similar to the one I posted above (over 24 hours ago) and the current status for greyhole is that it's optimizing MySQL tables.
My logs have all kinds of errors in them. There were some hardware issues that should be fixed, but I need to get my greyhole database and pointers in order now. I literally have no idea what to do and I've lost close to a week to this and I really need to get my machine working properly again.
Re: Clear stuck queue?/Remove all tombstones and re-fsck
Posted: Mon Jul 25, 2011 7:11 pm
by lrevxl
Are the corresponding files also on the pool drives?
Are you certain Greyhole is actually running? I've never seen the optimizing tables take more than a few seconds. What do you see if you check on the status of the service? as root -- `service greyhole status`
Re: Clear stuck queue?/Remove all tombstones and re-fsck
Posted: Mon Jul 25, 2011 7:42 pm
by AndyNJ
It says greyhole is running, but now the queue is empty (wasn't when I wrote the last reply) and the status is still optimizing the MySQL tables.
A quick spot check seems to show the files in the gh folder on pool drives.
Re: Clear stuck queue?/Remove all tombstones and re-fsck
Posted: Tue Jul 26, 2011 4:57 am
by lrevxl
Ah, you're running greyhole --status, right? That simply tells you what the last logged action was. If you were to look at /var/log/greyhole.log you'd see a lot of 'Nothing to do... Sleeping.' But your files are there and your queue is empty, so you're good now, right?
Re: Clear stuck queue?/Remove all tombstones and re-fsck
Posted: Tue Jul 26, 2011 5:31 am
by AndyNJ
The queue is mostly empty and my files are in the pool drives, but there are no pointers to many of the files in the landing zone. I can't access them via the shares.
Re: Clear stuck queue?/Remove all tombstones and re-fsck
Posted: Tue Jul 26, 2011 5:34 am
by AndyNJ
Also, I'm noticing a lot of input/output errors in the greyhole log.
Code: Select all
PHP Warning (2): mkdir(): Input/output error in /usr/bin/greyhole on line 2600
PHP Warning (2): mkdir(): Input/output error in /usr/bin/greyhole on line 2604
Re: Clear stuck queue?/Remove all tombstones and re-fsck
Posted: Tue Jul 26, 2011 6:30 am
by lrevxl
Also, I'm noticing a lot of input/output errors in the greyhole log.
Code: Select all
PHP Warning (2): mkdir(): Input/output error in /usr/bin/greyhole on line 2600
PHP Warning (2): mkdir(): Input/output error in /usr/bin/greyhole on line 2604
That looks like you're having hardware issues. Take a look at your /var/log/messages, I'm guessing you'll have an unpleasant surprise waiting for you there.
Where is your landing zone located? Putting two and two together, the fact that you're getting php i/o errors and you have no symlinks being created in the LZ, I'm guessing whatever drive the LZ is on is having issues.
Re: Clear stuck queue?/Remove all tombstones and re-fsck
Posted: Tue Jul 26, 2011 7:31 am
by AndyNJ
You may be right, here's a section of the log:
Code: Select all
Jul 26 09:52:41 donbot kernel: [164377.821100] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jul 26 09:52:41 donbot kernel: [164377.821111] ata4.00: irq_stat 0x40000001
Jul 26 09:52:41 donbot kernel: [164377.821120] ata4.00: failed command: READ DMA EXT
Jul 26 09:52:41 donbot kernel: [164377.821136] ata4.00: cmd 25/00:08:48:ea:86/00:00:4f:00:00/e0 tag 0 dma 4096 in
Jul 26 09:52:41 donbot kernel: [164377.821140] res 51/40:08:48:ea:86/00:00:4f:00:00/e0 Emask 0x9 (media error)
Jul 26 09:52:41 donbot kernel: [164377.821209] ata4.00: status: { DRDY ERR }
Jul 26 09:52:41 donbot kernel: [164377.821217] ata4.00: error: { UNC }
Jul 26 09:52:42 donbot kernel: [164378.805382] ata4.00: configured for UDMA/33
Jul 26 09:52:42 donbot kernel: [164378.805412] ata4: EH complete
Jul 26 09:52:46 donbot kernel: [164382.020191] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jul 26 09:52:46 donbot kernel: [164382.020202] ata4.00: irq_stat 0x40000001
Jul 26 09:52:46 donbot kernel: [164382.020211] ata4.00: failed command: READ DMA EXT
Jul 26 09:52:46 donbot kernel: [164382.020227] ata4.00: cmd 25/00:08:48:ea:86/00:00:4f:00:00/e0 tag 0 dma 4096 in
Jul 26 09:52:46 donbot kernel: [164382.020231] res 51/40:08:48:ea:86/00:00:4f:00:00/e0 Emask 0x9 (media error)
Jul 26 09:52:46 donbot kernel: [164382.020240] ata4.00: status: { DRDY ERR }
Jul 26 09:52:46 donbot kernel: [164382.020246] ata4.00: error: { UNC }
Jul 26 09:52:47 donbot kernel: [164383.788504] ata4.00: configured for UDMA/33
Jul 26 09:52:47 donbot kernel: [164383.788533] ata4: EH complete
Jul 26 09:52:51 donbot kernel: [164387.012060] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jul 26 09:52:51 donbot kernel: [164387.012071] ata4.00: irq_stat 0x40000001
Jul 26 09:52:51 donbot kernel: [164387.012079] ata4.00: failed command: READ DMA EXT
Jul 26 09:52:51 donbot kernel: [164387.012096] ata4.00: cmd 25/00:08:48:ea:86/00:00:4f:00:00/e0 tag 0 dma 4096 in
Jul 26 09:52:51 donbot kernel: [164387.012099] res 51/40:08:48:ea:86/00:00:4f:00:00/e0 Emask 0x9 (media error)
Jul 26 09:52:51 donbot kernel: [164387.012108] ata4.00: status: { DRDY ERR }
Jul 26 09:52:51 donbot kernel: [164387.012113] ata4.00: error: { UNC }
Jul 26 09:52:52 donbot kernel: [164388.501109] ata4.00: configured for UDMA/33
Jul 26 09:52:52 donbot kernel: [164388.501179] ata4: EH complete
Jul 26 09:52:55 donbot kernel: [164391.715647] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jul 26 09:52:55 donbot kernel: [164391.715658] ata4.00: irq_stat 0x40000001
Jul 26 09:52:55 donbot kernel: [164391.715667] ata4.00: failed command: READ DMA EXT
Jul 26 09:52:55 donbot kernel: [164391.715683] ata4.00: cmd 25/00:08:48:ea:86/00:00:4f:00:00/e0 tag 0 dma 4096 in
Jul 26 09:52:55 donbot kernel: [164391.715687] res 51/40:08:48:ea:86/00:00:4f:00:00/e0 Emask 0x9 (media error)
Jul 26 09:52:55 donbot kernel: [164391.715696] ata4.00: status: { DRDY ERR }
Jul 26 09:52:55 donbot kernel: [164391.715701] ata4.00: error: { UNC }
Jul 26 09:52:57 donbot kernel: [164393.483906] ata4.00: configured for UDMA/33
Jul 26 09:52:57 donbot kernel: [164393.483934] ata4: EH complete
Jul 26 09:53:00 donbot kernel: [164396.707521] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jul 26 09:53:00 donbot kernel: [164396.707532] ata4.00: irq_stat 0x40000001
Jul 26 09:53:00 donbot kernel: [164396.707540] ata4.00: failed command: READ DMA EXT
Jul 26 09:53:00 donbot kernel: [164396.707557] ata4.00: cmd 25/00:08:48:ea:86/00:00:4f:00:00/e0 tag 0 dma 4096 in
Jul 26 09:53:00 donbot kernel: [164396.707561] res 51/40:08:48:ea:86/00:00:4f:00:00/e0 Emask 0x9 (media error)
Jul 26 09:53:00 donbot kernel: [164396.707569] ata4.00: status: { DRDY ERR }
Jul 26 09:53:00 donbot kernel: [164396.707575] ata4.00: error: { UNC }
Jul 26 09:53:02 donbot kernel: [164398.196929] ata4.00: configured for UDMA/33
Jul 26 09:53:02 donbot kernel: [164398.196958] ata4: EH complete
Jul 26 09:53:07 donbot kernel: [164403.175830] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jul 26 09:53:07 donbot kernel: [164403.175841] ata4.00: irq_stat 0x40000001
Jul 26 09:53:07 donbot kernel: [164403.175849] ata4.00: failed command: READ DMA EXT
Jul 26 09:53:07 donbot kernel: [164403.175865] ata4.00: cmd 25/00:08:48:ea:86/00:00:4f:00:00/e0 tag 0 dma 4096 in
Jul 26 09:53:07 donbot kernel: [164403.175869] res 51/40:08:48:ea:86/00:00:4f:00:00/e0 Emask 0x9 (media error)
Jul 26 09:53:07 donbot kernel: [164403.175878] ata4.00: status: { DRDY ERR }
Jul 26 09:53:07 donbot kernel: [164403.175884] ata4.00: error: { UNC }
Jul 26 09:53:08 donbot kernel: [164404.664905] ata4.00: configured for UDMA/33
Jul 26 09:53:08 donbot kernel: [164404.664936] sd 3:0:0:0: [sdb] Unhandled sense code
Jul 26 09:53:08 donbot kernel: [164404.664942] sd 3:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 26 09:53:08 donbot kernel: [164404.664952] sd 3:0:0:0: [sdb] Sense Key : Medium Error [current] [descriptor]
Jul 26 09:53:08 donbot kernel: [164404.664963] Descriptor sense data with sense descriptors (in hex):
Jul 26 09:53:08 donbot kernel: [164404.664969] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Jul 26 09:53:08 donbot kernel: [164404.664989] 4f 86 ea 48
Jul 26 09:53:08 donbot kernel: [164404.664998] sd 3:0:0:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed
Jul 26 09:53:08 donbot kernel: [164404.665010] sd 3:0:0:0: [sdb] CDB: Read(10): 28 00 4f 86 ea 48 00 00 08 00
Jul 26 09:53:08 donbot kernel: [164404.665029] end_request: I/O error, dev sdb, sector 1334241864
Jul 26 09:53:08 donbot kernel: [164404.665077] ata4: EH complete
Jul 26 09:53:15 donbot kernel: [164411.635082] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jul 26 09:53:15 donbot kernel: [164411.635093] ata4.00: irq_stat 0x40000001
Jul 26 09:53:15 donbot kernel: [164411.635101] ata4.00: failed command: READ DMA EXT
Jul 26 09:53:15 donbot kernel: [164411.635118] ata4.00: cmd 25/00:08:48:ea:86/00:00:4f:00:00/e0 tag 0 dma 4096 in
Jul 26 09:53:15 donbot kernel: [164411.635122] res 51/40:08:48:ea:86/00:00:4f:00:00/e0 Emask 0x9 (media error)
Jul 26 09:53:15 donbot kernel: [164411.635180] ata4.00: status: { DRDY ERR }
Jul 26 09:53:15 donbot kernel: [164411.635189] ata4.00: error: { UNC }
Jul 26 09:53:17 donbot kernel: [164413.403374] ata4.00: configured for UDMA/33
Jul 26 09:53:17 donbot kernel: [164413.403403] ata4: EH complete
Jul 26 09:53:20 donbot kernel: [164416.626957] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jul 26 09:53:20 donbot kernel: [164416.626968] ata4.00: irq_stat 0x40000001
Jul 26 09:53:20 donbot kernel: [164416.626976] ata4.00: failed command: READ DMA EXT
Jul 26 09:53:20 donbot kernel: [164416.626993] ata4.00: cmd 25/00:08:48:ea:86/00:00:4f:00:00/e0 tag 0 dma 4096 in
Jul 26 09:53:20 donbot kernel: [164416.626997] res 51/40:08:48:ea:86/00:00:4f:00:00/e0 Emask 0x9 (media error)
Jul 26 09:53:20 donbot kernel: [164416.627005] ata4.00: status: { DRDY ERR }
Jul 26 09:53:20 donbot kernel: [164416.627011] ata4.00: error: { UNC }
Jul 26 09:53:22 donbot kernel: [164418.116573] ata4.00: configured for UDMA/33
Jul 26 09:53:22 donbot kernel: [164418.116602] ata4: EH complete
Jul 26 09:53:25 donbot kernel: [164421.331545] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jul 26 09:53:25 donbot kernel: [164421.331556] ata4.00: irq_stat 0x40000001
Jul 26 09:53:25 donbot kernel: [164421.331564] ata4.00: failed command: READ DMA EXT
Jul 26 09:53:25 donbot kernel: [164421.331581] ata4.00: cmd 25/00:08:48:ea:86/00:00:4f:00:00/e0 tag 0 dma 4096 in
Jul 26 09:53:25 donbot kernel: [164421.331585] res 51/40:08:48:ea:86/00:00:4f:00:00/e0 Emask 0x9 (media error)
Jul 26 09:53:25 donbot kernel: [164421.331593] ata4.00: status: { DRDY ERR }
Jul 26 09:53:25 donbot kernel: [164421.331599] ata4.00: error: { UNC }
Jul 26 09:53:27 donbot kernel: [164423.099826] ata4.00: configured for UDMA/33
Jul 26 09:53:27 donbot kernel: [164423.099855] ata4: EH complete
Jul 26 09:53:30 donbot kernel: [164426.323397] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jul 26 09:53:30 donbot kernel: [164426.323408] ata4.00: irq_stat 0x40000001
Jul 26 09:53:30 donbot kernel: [164426.323416] ata4.00: failed command: READ DMA EXT
Jul 26 09:53:30 donbot kernel: [164426.323433] ata4.00: cmd 25/00:08:48:ea:86/00:00:4f:00:00/e0 tag 0 dma 4096 in
Jul 26 09:53:30 donbot kernel: [164426.323437] res 51/40:08:48:ea:86/00:00:4f:00:00/e0 Emask 0x9 (media error)
Jul 26 09:53:30 donbot kernel: [164426.323445] ata4.00: status: { DRDY ERR }
Jul 26 09:53:30 donbot kernel: [164426.323451] ata4.00: error: { UNC }
Jul 26 09:53:33 donbot kernel: [164429.572948] ata4.00: configured for UDMA/33
Jul 26 09:53:33 donbot kernel: [164429.572977] ata4: EH complete
Jul 26 09:53:36 donbot kernel: [164432.790722] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jul 26 09:53:36 donbot kernel: [164432.790733] ata4.00: irq_stat 0x40000001
Jul 26 09:53:36 donbot kernel: [164432.790741] ata4.00: failed command: READ DMA EXT
Jul 26 09:53:36 donbot kernel: [164432.790758] ata4.00: cmd 25/00:08:48:ea:86/00:00:4f:00:00/e0 tag 0 dma 4096 in
Jul 26 09:53:36 donbot kernel: [164432.790762] res 51/40:08:48:ea:86/00:00:4f:00:00/e0 Emask 0x9 (media error)
Jul 26 09:53:36 donbot kernel: [164432.790771] ata4.00: status: { DRDY ERR }
Jul 26 09:53:36 donbot kernel: [164432.790777] ata4.00: error: { UNC }
Jul 26 09:53:40 donbot kernel: [164436.041375] ata4.00: configured for UDMA/33
Jul 26 09:53:40 donbot kernel: [164436.041404] ata4: EH complete
Jul 26 09:53:43 donbot kernel: [164439.259033] ata4.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
Jul 26 09:53:43 donbot kernel: [164439.259044] ata4.00: irq_stat 0x40000001
Jul 26 09:53:43 donbot kernel: [164439.259053] ata4.00: failed command: READ DMA EXT
Jul 26 09:53:43 donbot kernel: [164439.259070] ata4.00: cmd 25/00:08:48:ea:86/00:00:4f:00:00/e0 tag 0 dma 4096 in
Jul 26 09:53:43 donbot kernel: [164439.259074] res 51/40:08:48:ea:86/00:00:4f:00:00/e0 Emask 0x9 (media error)
Jul 26 09:53:43 donbot kernel: [164439.259082] ata4.00: status: { DRDY ERR }
Jul 26 09:53:43 donbot kernel: [164439.259088] ata4.00: error: { UNC }
Jul 26 09:53:44 donbot kernel: [164440.244180] ata4.00: configured for UDMA/33
Jul 26 09:53:44 donbot kernel: [164440.244210] sd 3:0:0:0: [sdb] Unhandled sense code
Jul 26 09:53:44 donbot kernel: [164440.244216] sd 3:0:0:0: [sdb] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jul 26 09:53:44 donbot kernel: [164440.244225] sd 3:0:0:0: [sdb] Sense Key : Medium Error [current] [descriptor]
Jul 26 09:53:44 donbot kernel: [164440.244236] Descriptor sense data with sense descriptors (in hex):
Jul 26 09:53:44 donbot kernel: [164440.244242] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
Jul 26 09:53:44 donbot kernel: [164440.244263] 4f 86 ea 48
Jul 26 09:53:44 donbot kernel: [164440.244272] sd 3:0:0:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed
Jul 26 09:53:44 donbot kernel: [164440.244284] sd 3:0:0:0: [sdb] CDB: Read(10): 28 00 4f 86 ea 48 00 00 08 00
Jul 26 09:53:44 donbot kernel: [164440.244303] end_request: I/O error, dev sdb, sector 1334241864
Jul 26 09:53:44 donbot kernel: [164440.244356] ata4: EH complete
It keeps mention 'sdb' but I'm not sure off the top of my head which drive that is. How can I check that via the terminal (I only have SSH access at the moment)? The LZ is a separate partition on my system drive...which hasn't given any other indication of any problems.