fread() on a bad disk?

This is a discussion on fread() on a bad disk? within the Linux Programming forums, part of the Platform Specific Boards category; Originally Posted by annied tried with fcntl(), I got passed the read() but then hang on the close(fd). I'm sure ...

  1. #31
    Registered User
    Join Date
    Apr 2007
    Posts
    26
    Quote Originally Posted by annied View Post
    tried with fcntl(), I got passed the read() but then hang on the close(fd).

    I'm sure I missed something here. Can someone have a look?
    Not that familiar with fcntl()

    Code:
            int             fd_flags = 0;
    
            fd = open(fullpath, O_RDONLY|O_NONBLOCK|O_NDELAY);
            if (!fd) {
                    fprintf(stderr, "ERROR: Can't open device %s
    \n",
                            fullpath);
                    return 0;
            }
            fd_flags = fcntl(fd, F_GETFL, 0);
            fcntl(fd, F_SETFL, fd_flags | O_NONBLOCK);
    
            ret = read(fd, buffer, sizeof(buffer));
            close(fd);

    my mistake, it is still hanging on the read(), been staring at this too long

  2. #32
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,273
    Quote Originally Posted by annied View Post
    tried with fcntl(), I got passed the read() but then hang on the close(fd).
    The only way to get around this that I can think of, is to set up a timer with alarm() to send a signal to yourself after a small period, say 10 seconds. Do this right before trying to close() the fd. If close() hangs, the signal will interrupt it and it will return the EINTR error status. I think. I don't have a way to test it, but you could give it a shot.

    Not being able to close the descriptor is a pain in the butt. It's possible that this is a crufty little corner of Linux with no clean solution possible. You may have to make some device-specific calls.

  3. #33
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,273
    Quote Originally Posted by annied View Post
    my mistake, it is still hanging on the read(), been staring at this too long
    Then it's not actually blocking (since you specified O_NONBLOCK), it's a true hang. See my other comment about using a signal to force the read()/close() to terminate early. Cross your fingers and hope it works.

    Are you sure the driver for this device is meant to work in a hot-swap mode? Maybe the driver itself is getting confused when you pull the disk.

  4. #34
    Registered User
    Join Date
    Apr 2007
    Posts
    26
    Quote Originally Posted by brewbuck View Post
    Then it's not actually blocking (since you specified O_NONBLOCK), it's a true hang. See my other comment about using a signal to force the read()/close() to terminate early. Cross your fingers and hope it works.

    Are you sure the driver for this device is meant to work in a hot-swap mode? Maybe the driver itself is getting confused when you pull the disk.

    thanks for all your help guys....I'll see what I find and post the result here. thanks again.

  5. #35
    Registered User
    Join Date
    Apr 2007
    Posts
    26
    one last bit of information. I was able to drop into the kernel via kdb during the hang and this is the stack trace:

    RSP RIP Function (args)
    0x100b2ed9b08 0xffffffff80318138 schedule+0xb6e (0x1, 0x100bf95d240)
    0x100b2ed9bd8 0xffffffff80318aab io_schedule+0x26 (0x100bf95d240, 0x0, 0x1,
    0x100bb5bb030, 0xffffffff8015a026)
    0x100b2ed9bf8 0xffffffff8015a54c __lock_page+0xbf (0x46, 0x0, 0x40000000, 0x10, 0x1)
    0x100b2ed9c98 0xffffffff8015aaa7 do_generic_mapping_read+0x1f4 (0x0,
    0x100bbb3fd68, 0x100a341b8c0, 0x100b2ed9f50, 0x0)
    0x100b2ed9d98 0xffffffff8015c907 __generic_file_aio_read+0x181 (0x1,
    0x100a341b8c0, 0x0, 0xffffffff00000001, 0x100a341b8c0)
    0x100b2ed9e18 0xffffffff8015caa2 generic_file_read+0xbb (0x100a341b8c0, 0x1000,
    0xf7d47000, 0xfffffffffffffff7, 0x0)
    0x100b2ed9f18 0xffffffff80179a97 vfs_read+0xcf
    0x100b2ed9f48 0xffffffff80179cee sys_read+0x45

  6. #36
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,273
    Here's the important bit of that trace:

    Quote Originally Posted by annied View Post
    RSP RIP Function (args)
    0x100b2ed9b08 0xffffffff80318138 schedule+0xb6e (0x1, 0x100bf95d240)
    0x100b2ed9bd8 0xffffffff80318aab io_schedule+0x26 (0x100bf95d240, 0x0, 0x1,
    0x100bb5bb030, 0xffffffff8015a026)
    0x100b2ed9bf8 0xffffffff8015a54c __lock_page+0xbf (0x46, 0x0, 0x40000000, 0x10, 0x1)
    An attempt to lock a page of memory failed (the lock was already held elsewhere), so the scheduler is invoked to make the process wait. You'd have to figure out what other process is actually holding that lock. It may be the driver itself which is holding it. I'm beginning to suspect that the driver is not capable of properly handling the sudden removal of the device.

  7. #37
    Registered User
    Join Date
    Apr 2007
    Posts
    26
    Quote Originally Posted by brewbuck View Post
    Here's the important bit of that trace:



    An attempt to lock a page of memory failed (the lock was already held elsewhere), so the scheduler is invoked to make the process wait. You'd have to figure out what other process is actually holding that lock. It may be the driver itself which is holding it. I'm beginning to suspect that the driver is not capable of properly handling the sudden removal of the device.
    thanks for that very important explanation. I really appreciate it.

  8. #38
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,273
    Quote Originally Posted by annied View Post
    thanks for that very important explanation. I really appreciate it.
    Anybody who can use a kernel debugger is going to get my attention Out of curiosity, what kind of disk is this? Can you determine what driver is used? Look in /proc/ide or /proc/scsi or whatever is appropriate to the type of disk. If you can tell me the actual driver, I can poke around and see if there is some obscure ioctl() call that will help.

  9. #39
    Registered User
    Join Date
    Apr 2007
    Posts
    26
    Quote Originally Posted by brewbuck View Post
    Anybody who can use a kernel debugger is going to get my attention Out of curiosity, what kind of disk is this? Can you determine what driver is used? Look in /proc/ide or /proc/scsi or whatever is appropriate to the type of disk. If you can tell me the actual driver, I can poke around and see if there is some obscure ioctl() call that will help.
    this is a disk which is part of an EMC CLARion array:

    an inquiry on the disk shows (when its not pulled out):

    Vendor Identification : DGC
    Product Identification : RAID 0
    Revision Number : 0219

    Let me know if that is the info you meant. Otherwise I can see what else I can find to be more specific. thanks!

  10. #40
    Registered User
    Join Date
    Apr 2007
    Posts
    26
    Quote Originally Posted by annied View Post
    this is a disk which is part of an EMC CLARion array:

    an inquiry on the disk shows (when its not pulled out):

    Vendor Identification : DGC
    Product Identification : RAID 0
    Revision Number : 0219

    Let me know if that is the info you meant. Otherwise I can see what else I can find to be more specific. thanks!
    /proc/scsi/scsi shows:

    Host: scsi1 Channel: 00 Id: 02 Lun: 00
    Vendor: DGC Model: LUNZ Rev: 0219
    Type: Direct-Access ANSI SCSI revision: 04

  11. #41
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,273
    Quote Originally Posted by annied View Post
    Let me know if that is the info you meant. Otherwise I can see what else I can find to be more specific. thanks!
    Better than nothing, but the actual Linux driver being used would be the most helpful. Is this the boot device? If not, the driver may be loaded as a module, so it would appear in the listing of 'lsmod'.

  12. #42
    Registered User
    Join Date
    Apr 2007
    Posts
    26
    Quote Originally Posted by brewbuck View Post
    Better than nothing, but the actual Linux driver being used would be the most helpful. Is this the boot device? If not, the driver may be loaded as a module, so it would appear in the listing of 'lsmod'.
    do you mean the actual volume manager such as LVM or Veritas Volume Manager?
    sorry if I'm missing your question.

    I checked in lsmod but didn't see anything that stuck out.

  13. #43
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,273
    Quote Originally Posted by annied View Post
    do you mean the actual volume manager such as LVM or Veritas Volume Manager?
    sorry if I'm missing your question.

    I checked in lsmod but didn't see anything that stuck out.
    Since it's part of a RAID array, I mean the particular RAID driver you're using. Not the manager, the actual driver the kernel uses to talk to the controller.

  14. #44
    Registered User
    Join Date
    Apr 2007
    Posts
    26
    Quote Originally Posted by brewbuck View Post
    Since it's part of a RAID array, I mean the particular RAID driver you're using. Not the manager, the actual driver the kernel uses to talk to the controller.
    From lsmod, I'm guessing the qlogic driver ?

    qla2xxx 182945 11 qla2400

  15. #45
    Captain Crash brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,273
    Quote Originally Posted by annied View Post
    From lsmod, I'm guessing the qlogic driver ?

    qla2xxx 182945 11 qla2400
    Sounds like the right piece of info. I'll see what I can find.

    ...

    Well, I've been perusing the source code of this driver and it's a fairly large driver. There appear to be a zillion ioctl() calls, and I can't find any clear documentation either in the source code in the form of comments, or in the downloaded archive itself. Honestly I think this driver kind of sucks, but oh well. Try asking around on a qla2xxx-related mailing list to see if there is some ioctl() that either tells you the disconnect status of the drive, or at least configure the timeout so your program doesn't hang forever.

    One other thing to try. While your program is hung, kill it with a SIGUSR1 signal. See if it pops out of the hang. If it does, then the alarm() technique I mentioned earlier should at least allow you to get out of the hang.
    Last edited by brewbuck; 04-29-2007 at 04:19 PM.

Page 3 of 3 FirstFirst 123
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Windows system disk on an USB storage: possible?
    By BrownB in forum Tech Board
    Replies: 1
    Last Post: 11-10-2006, 04:34 AM
  2. Poker bad beats
    By PJYelton in forum A Brief History of Cprogramming.com
    Replies: 21
    Last Post: 01-15-2005, 11:42 PM
  3. How bad is bad
    By caroundw5h in forum A Brief History of Cprogramming.com
    Replies: 21
    Last Post: 11-12-2004, 09:26 AM
  4. Shocking(kind of)
    By Shadow in forum A Brief History of Cprogramming.com
    Replies: 25
    Last Post: 12-10-2002, 08:52 PM
  5. good news and bad news
    By Garfield in forum A Brief History of Cprogramming.com
    Replies: 25
    Last Post: 10-27-2001, 08:31 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21