Not being able to close the descriptor is a pain in the butt. It's possible that this is a crufty little corner of Linux with no clean solution possible. You may have to make some device-specific calls.
Are you sure the driver for this device is meant to work in a hot-swap mode? Maybe the driver itself is getting confused when you pull the disk.
one last bit of information. I was able to drop into the kernel via kdb during the hang and this is the stack trace:
RSP RIP Function (args)
0x100b2ed9b08 0xffffffff80318138 schedule+0xb6e (0x1, 0x100bf95d240)
0x100b2ed9bd8 0xffffffff80318aab io_schedule+0x26 (0x100bf95d240, 0x0, 0x1,
0x100b2ed9bf8 0xffffffff8015a54c __lock_page+0xbf (0x46, 0x0, 0x40000000, 0x10, 0x1)
0x100b2ed9c98 0xffffffff8015aaa7 do_generic_mapping_read+0x1f4 (0x0,
0x100bbb3fd68, 0x100a341b8c0, 0x100b2ed9f50, 0x0)
0x100b2ed9d98 0xffffffff8015c907 __generic_file_aio_read+0x181 (0x1,
0x100a341b8c0, 0x0, 0xffffffff00000001, 0x100a341b8c0)
0x100b2ed9e18 0xffffffff8015caa2 generic_file_read+0xbb (0x100a341b8c0, 0x1000,
0xf7d47000, 0xfffffffffffffff7, 0x0)
0x100b2ed9f18 0xffffffff80179a97 vfs_read+0xcf
0x100b2ed9f48 0xffffffff80179cee sys_read+0x45
Here's the important bit of that trace:
an inquiry on the disk shows (when its not pulled out):
Vendor Identification : DGC
Product Identification : RAID 0
Revision Number : 0219
Let me know if that is the info you meant. Otherwise I can see what else I can find to be more specific. thanks!
Well, I've been perusing the source code of this driver and it's a fairly large driver. There appear to be a zillion ioctl() calls, and I can't find any clear documentation either in the source code in the form of comments, or in the downloaded archive itself. Honestly I think this driver kind of sucks, but oh well. Try asking around on a qla2xxx-related mailing list to see if there is some ioctl() that either tells you the disconnect status of the drive, or at least configure the timeout so your program doesn't hang forever.
One other thing to try. While your program is hung, kill it with a SIGUSR1 signal. See if it pops out of the hang. If it does, then the alarm() technique I mentioned earlier should at least allow you to get out of the hang.
Last edited by brewbuck; 04-29-2007 at 03:19 PM.