Thread: Freak data loss

  1. #1
    Registered User
    Join Date
    Mar 2007
    Posts
    142

    Freak data loss

    My clients have experienced last friday strange loss of data from my application.

    The app works with almost 200 files, but at any given moment only 20 are actually open. So, they were working in it, at some point there was a surge in power, the lights blinked once, the screen turned off, but computer continued to work. As they tried to save, application reported an error and on relaunch all the documents were missing.

    They called me and I connected over remote access and this is what I found: 36 files (that should have had content from the documents they entered since March) were 4K in size (i.e. empty, only few bytes in use, all zeroes). These files had all creation date March 1st and modification date Apr 15, 17:20. The rest of the files, never used as it seems, had both creation and modification at Mach 1st. Those files that were never used were 4K in size too, but their content was not zeroed out. They had proper header content at the beginning.

    There is a file that keeps record of changes/transactions, but that file is empy too.

    !?!#?%!@!

    What could have happened? I have the same application on several hundred other places and never, ever anything similar to that ever occured. On power loss, maybe one or two records would disappear sometimes, most of the times everything would be just fine. This makes no sense to me. Or maybe it should? I have this app on Mac OSX and Windows and I never experienced anything similar to this.

    Is it something related to hardware? Maybe some strange drive configuration, something in OS, some strange settings (XP SP2). They don't have a fresh backup, of course. Bad luck?
    Last edited by idelovski; 04-18-2011 at 04:46 AM.

  2. #2
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    It would help if you described the file handling steps you use as well as the opening method(s). As well as the language.

    From your description however, I'm guessing you are using buffers. And probably you were caught in freak accident in which the buffer was zeroed out due to the power surge and then flushed to disk, when users tried to save or as part of its normal operation (buffer became full, your code uses explicit flush, etc...).
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  3. #3
    Registered User
    Join Date
    Mar 2007
    Posts
    142
    Quote Originally Posted by Mario F. View Post
    It would help if you described the file handling steps you use as well as the opening method(s). As well as the language.
    Straight C, files are opened with regular CreateFile() call, after every batch, transaction files involved in it are flushed, but this usually involves from two to maybe four or five files at any given scenario that I can think of. There is no need to flush 30 files at the same time.

    Other calls that are used: WriteFile(), ReadFile() and CloseHandle().

  4. #4
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by idelovski View Post
    Straight C, files are opened with regular CreateFile() call, after every batch, transaction files involved in it are flushed, but this usually involves from two to maybe four or five files at any given scenario that I can think of. There is no need to flush 30 files at the same time.

    Other calls that are used: WriteFile(), ReadFile() and CloseHandle().
    I take it these are Windows calls... Check the Options in your CreateFile() calls... if you do not have either FILE_FLAG_WRITE_THROUGH or FILE_FLAG_NO_BUFFERING set, the data will buffer in memory. If the power surge scrambles memory... guess what?

    There's also a possibility that although the computer kept running it may not have been running in an organized manner and simply truncated the files.

    Frankly if your software tends to do this, it might be a good idea to create backup files (they can be hidden) just in case...
    Or you could have a sit down with your customer and talk about uninterruptable power supplies, which should give them enough time to shut down properly.

  5. #5
    and the hat of sweating
    Join Date
    Aug 2007
    Location
    Toronto, ON
    Posts
    3,545
    Quote Originally Posted by CommonTater View Post
    Or you could have a sit down with your customer and talk about uninterruptable power supplies, which should give them enough time to shut down properly.
    And also about doing regular backups.
    "I am probably the laziest programmer on the planet, a fact with which anyone who has ever seen my code will agree." - esbo, 11/15/2008

    "the internet is a scary place to be thats why i dont use it much." - billet, 03/17/2010

  6. #6
    Registered User
    Join Date
    Mar 2007
    Posts
    142
    Everybody knows they should make backups. Nothing new in that. My real question was if this was technically possible, or better yet, if anyone here experienced something similar?

  7. #7
    Registered User
    Join Date
    Mar 2007
    Posts
    142
    I went to their place and now I have more information.

    Their ISP cut them off for 24 hours because they were sending spam and now they have a big clean up campaign.

    The file trashing that occurred last friday truncated files in alphabetical order, as 8 files with names from letter U till letter Z have preserved content. My application never deals with files in alphabetical order. Even at startup they are opened in a logical order, by their function.

    Anyway, there is one file that starts with letter Z and its purpose is some sort of Log file. It just keeps tracks of events, so if someone does an export or import, there is one record that says export, on that computer, that person, at that time and so on. If someone adds a document, changes something, tempers with preferences, there is one line with the summary of the event. Anyway, that file is ok.

    The thing is, in general, with all the other users that I have and experience so far, this file usually gets corrupted if something happens because it is updated all the time and is updated last and if someone has any problem because of power loss it is usually this file that needs some sort of fixing. Well, in this case, this file is intact.

    And finally, third weird detail, all files have creation date March 1, but they started using my application in February and creation date should have been earlier than March.

    So, all in all, nothing makes any sense.

  8. #8
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by idelovski View Post
    The file trashing that occurred last friday truncated files in alphabetical order, as 8 files with names from letter U till letter Z have preserved content.
    By any chance the computer/server where these files are stored (I presume they are in a single location, right?) does periodic defragmentation, or has done a fragmentation in the past when these files already existed, with a setting controlling how files are sorted in their directories?

    I'm not sure how to interpret that curious observation you made. But this thought did occur to me that more than just memory got trashed.

    Quote Originally Posted by idelovski View Post
    The thing is, in general, with all the other users that I have and experience so far, this file usually gets corrupted if something happens because it is updated all the time and is updated last and if someone has any problem because of power loss it is usually this file that needs some sort of fixing. Well, in this case, this file is intact.
    Log files are very resilient if you happen to just open them for appending and immediately close them again. I'm not surprised this didn't affect them, if this is your method of using them.
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

  9. #9
    Registered User
    Join Date
    Mar 2007
    Posts
    142
    Log files are very resilient if you happen to just open them for appending and immediately close them again. I'm not surprised this didn't affect them, if this is your method of using them.
    This isn't real log file. It is a database file that serves as a log file for events at business logic level. I have another log file, that is just a text file and serves as a log file for system and other low level events (like a Handle is NULL and it shouldn't be, so returning from GetAllOpenWindowsBlah() and similar stuff). That log file is a flat text file where I append a line and then I close it.

    The Z logging file is a one among the files that make this database and is opened and closed like others. As I said above, of 200 files that make this database, at any given moment only 20 are really opened and then closed when others are needed. Standard FIFO method I suppose, the one not being used for the most time is closed as I need to open other file that is closed at the moment. In practice, that means that the Z file is probably opened most of the time as it is been in constant use. Almost every change in the database produces new record in the Z file. Because of that, this file is the most vulnerable to power loss or application crashes.

    Oh, now I remember why I created this file. At some point I wanted to know how often my application crashes so I decided to have a file where I can put a record for each login and logout. So if there is 300 logins and 297 logouts in a year, that means 2 crashes plus current login. In time I decided to add every other event into this file as it gives me a lot of useful information how my clients use my app.
    Last edited by idelovski; 04-19-2011 at 11:07 AM. Reason: style, spelling, few clarifications

  10. #10
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by idelovski View Post
    I went to their place and now I have more information.

    Their ISP cut them off for 24 hours because they were sending spam and now they have a big clean up campaign.

    So, all in all, nothing makes any sense.
    So what does the ISP cut off have to do with a power outage? Did they spontaneously tell you this in order to muddy the water, and/or is the story becoming more...mysteriously and incongruously detailed?

    If you think this is impossible AND it hasn't happened before despite being used regularly by hundreds of people, maybe it is impossible.

    Meaning someone is lying to cover their own backside -- maybe even some malicious disgruntled person -- which as long as that person is not in charge, heh-heh, mebbe you should drops some hints about that "impossibility".
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  11. #11
    Registered User
    Join Date
    Mar 2007
    Posts
    142
    Quote Originally Posted by MK27 View Post
    So what does the ISP cut off have to do with a power outage? Did they spontaneously tell you this in order to muddy the water, and/or is the story becoming more...mysteriously and incongruously detailed?
    I was there and I wanted to zip the folder with the database and mail it to myself and then they said I can't because they don't have the internet. It seems that few of their computers had viruses and today they were cleaning them. Just as remark, I think that spam boots use rootkits and they are much harder to detect so I think they'll need to reformat everything, but it's not like I know too much about these things. They told me that ISP will reconnect them automatically after 24 hours, but if they're not clean then the next outage will be 7 days or so.

    As much as I know, rootkits are driver impostors, so maybe some badly written rootkit that presents itself as disk driver did something to the real driver, or something like that. It's not that I studied rootkits too much. Maybe this can explain this whole ordeal!?

    Quote Originally Posted by MK27 View Post
    If you think this is impossible AND it hasn't happened before despite being used regularly by hundreds of people, maybe it is impossible.
    Yeah, that is why it all seems odd. Six years ago when I ported my application to Windows I was testing it in Windows simulator on my Mac exactly for crashes. I would press save in it on Windows then I would kill simulator on the Mac side, restart everything and most of the times everything was fine. Few times I managed to kill it at the right time so the indexes were corrupted and after rebuilding indexes the last document would disappear, but I can't remember that I managed to evaporate the whole database.
    Last edited by idelovski; 04-19-2011 at 11:45 AM.

  12. #12
    Registered User
    Join Date
    Mar 2007
    Posts
    142
    Quote Originally Posted by Mario F. View Post
    By any chance the computer/server where these files are stored ... does periodic defragmentation...?
    I somehow missed this question when I saw your message earlier today. I'll ask them tomorrow about this. Maybe they had it as scheduled task or something, as described here: How to Automate Disk Defragmenter Using Task Scheduler Tool in Windows XP.

  13. #13
    Registered User
    Join Date
    Mar 2007
    Posts
    142
    Well, few days ago I added one little extra information here, but it got lost as cboard had database problem on its own. Ironic or not, here it is again - my clients mailed me that they were not using defragmentation on their computers. It was something else. I just hope they'll never lose data again and i sincerely hope cboard never loses posts again.

  14. #14
    Banned
    Join Date
    Aug 2010
    Location
    Ontario Canada
    Posts
    9,547
    Quote Originally Posted by idelovski View Post
    Well, few days ago I added one little extra information here, but it got lost as cboard had database problem on its own. Ironic or not, here it is again - my clients mailed me that they were not using defragmentation on their computers. It was something else. I just hope they'll never lose data again and i sincerely hope cboard never loses posts again.
    Well, then the only real suggestion I can offer is a software update...
    1) keep hidden backups of your files that can be used to restore to the last file close...
    2) if you are writing files "whole cloth" to disk, write to a temp file and test for success. If it worked delete the original and rename the replacement.

    There are many strategies that can prevent this kind of thing.
    The big pain is identifying the cause... which you may never do.

  15. #15
    Registered User
    Join Date
    Mar 2007
    Posts
    142
    As long as this is a single case I'm ok. If it becomes epidemic, then I'm in trouble.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Loss of data with trig function
    By dvsumosize in forum C++ Programming
    Replies: 2
    Last Post: 03-06-2011, 01:47 AM
  2. Loss of member data
    By Know_Your_Role in forum C++ Programming
    Replies: 3
    Last Post: 12-14-2009, 02:49 AM
  3. Possible loss of data?
    By KnightAdz in forum C++ Programming
    Replies: 4
    Last Post: 03-07-2008, 01:45 PM
  4. Possible Loss of data
    By silicon in forum C Programming
    Replies: 3
    Last Post: 03-24-2004, 12:25 PM
  5. data loss bad bad bad
    By RoD in forum Tech Board
    Replies: 4
    Last Post: 05-01-2003, 12:06 PM