Thread: Efficiency problems

  1. #1
    Registered User
    Join Date
    Oct 2005
    Posts
    27

    Efficiency problems

    Ive been working on an artificial life simulator in C# recently. One major problem I am having though is efficiency. In the simulator, I generate a small text file for each turn that passes, which I can use through another program i made in order to create a visual representation of whats going on. The problem is that my simulator needs many turns to produce any interesting results. The Alife program shown in the AI programming book Im using runs to one million turns. After 50,000 turns, my computer gets really slow if I try to open the folder with the text files inside. Yesterday, I ran it to 100,000 turns and it took me almost 30 minutes to delete the whole data folder. Does anyone have any suggestions? Should I post my code?

  2. #2
    and the Hat of Clumsiness GanglyLamb's Avatar
    Join Date
    Oct 2002
    Location
    between photons and phonons
    Posts
    1,110
    This could be more a problem of how your OS locates space on your hard drive. If for each turn you generate a text file you'll end up with alot of text files. I think that for instance Windows XP allocates a block of 4Kb even if the actual size ( the text ) is maybe 3 bytes .... ( edit just checked 3 bytes end up with 4kb on hard disk ).

    So there will be alot of internal fragmentation ( space in one block that is not used, although it is marked as being used ). So each turn that is being read is like reading 4Kb or thats how the hard disk handles it. Each turn it reads a text file, then needs to adjust its head, and so on... very stressy sitaution imo.

    Try to merge the turns and see what happens, for instance take 5000 turns in one time and place them in 1 text file together instead of 5000 seperate files.

    The above is of course merely a thought and based on the fact that you use 1 text file for each turn.

    Also after reading in one turn with the other program , make sure the object containing the turn's data is disposed and output being generated based on maybe 100 turns each time instead of basing output on 5 million turns.

    These are just some things that come into my mind... I could be 5 bazillion miles away from a solution but this is all that pops into my fatigued head .

    This one is off for the night .

  3. #3
    Anti-Poster
    Join Date
    Feb 2002
    Posts
    1,401
    I agree with GanglyLamb. Why not have each turn write its own tag in one single XML document? Sure, the document may grow to be huge and you'd probably never want to open it up in Notepad, but then at least all of your data is in one place instead of in 100,000 pieces.

    In case you haven't used the XML namespace before, look into the XmlTextWriter and the WriteStartElement, WriteAttributeString, and WriteEndElement methods.
    If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein

  4. #4
    Registered User
    Join Date
    Oct 2005
    Posts
    27
    But then, id have a new problem---The program id use to illustrate the data will prbly be slow. The way i have it set up, you just enter the turn you want to see, and it just opens the text file with that number for its name. If I made it into one text file, wouldnt it have to search all the data from the first turn until the turn selected?

  5. #5
    and the Hat of Clumsiness GanglyLamb's Avatar
    Join Date
    Oct 2002
    Location
    between photons and phonons
    Posts
    1,110
    Depends, if you use an xml document for let's say each 1000 turns ( depending how much info you store for each turn ). Then when the turn is entered you check which file should be opened. Then load the file and search for the node that has the correct offset ID within the file.

    For instance turn 1024, this will be turn2.xml and offset 25.

    The time it takes a hard drive to read one file ( assuming that there is little fragmentation or that the filesize is less then one block of allocated space - with windows XP this is default 4Kb ) , is much longer then tor read the data from memory.
    Of course when loading the file first into memory you should take into account the initial loading of the file.
    It's completely up to you how many turns you place into 1 xml file. And I'm sure that if you would create a benchmark program you can find the right amount of turns in one xml file, for the minimum amount of time.

    Example:

    1 turn = 100 Bytes
    one block on hard drive in windows XP = 4 Kb
    maximum turns in 1 block: 40

    40 is not that much so lets take a file of 1Mb.
    maximum turns in 1 block: 10 485 turns.

    A 1Mb file is not that big and it can already store 10485 turns in it... And if you would then first ask record 1 , afterwards you can ask any record from 0 to 10485 without reloading any file from hard drive, the only thing you would need to do is access your memory and if the next turn you ask is somewhat located near the first one you asked then there is a big chance the turn could still be in cache memory.

    Im pretty sure this will be much much faster then accessing the hard disk for each turn.
    Last edited by GanglyLamb; 07-19-2006 at 03:13 AM.

  6. #6
    System Novice siavoshkc's Avatar
    Join Date
    Jan 2006
    Location
    Tehran
    Posts
    1,246
    I don't have any exprience about it but there should be some database structures that help solving this problem.
    Learn C++ (C++ Books, C Books, FAQ, Forum Search)
    Code painter latest version on sourceforge DOWNLOAD NOW!
    Download FSB Data Integrity Tester.
    Siavosh K C

  7. #7
    and the Hat of Clumsiness GanglyLamb's Avatar
    Join Date
    Oct 2002
    Location
    between photons and phonons
    Posts
    1,110
    I was thinking about using a DBMS as well, but I've used mySQL several times and once I had a database structure with more then 50 000 records in one table. Although asking for one record does not take much time, asking for more then one record as in asking in this example for 50 000 turns after each other will just be 50 000 * query time of one record. ( that is assuming for each record you have 1 query, if you have a query that will group 10 000 records then its a different story )

    And like the OP said in his first post , he has a program that needs more then one turn to produce decent results. Therefore loading one file into memory and afterwards accessing memory will be faster then using a DMBS , unless you could set up the DBMS to load a large amount of records in some kind of cache at once. So you don't need the full access time for each record.

  8. #8
    Registered User
    Join Date
    Oct 2005
    Posts
    27
    Ah I think I get it now. Why should I use XML though instead regular text files? I dont really know anything about XML.

  9. #9
    Anti-Poster
    Join Date
    Feb 2002
    Posts
    1,401
    Actually, after thinking about this again, I wouldn't use XML.
    • You want to have fast access to any single entry in the file, similar to an array.
    • An array is fast because you can do pointer arithmetic for member access.
    • You can do pointer arithmetic because each member is of fixed size.
    • Hence, you need a file that has entries that are of fixed size.
    XML probably will not produce a file with fixed-size entries.

    Consider whether or not an entry can be fixed-size. Does it have any strings? Does it have an variable-length arrays? If so, then it can't be fixed-size without setting some arbitrary limits.

    Once you've got fixed-size entries, entry access is simple. If you want the entry at index 134, you seek to (134 * size of entry) and read the desired entry.
    If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein

  10. #10
    System Novice siavoshkc's Avatar
    Join Date
    Jan 2006
    Location
    Tehran
    Posts
    1,246
    Yes, one text file can keep all the data. I have a simple model.
    data\n
    data\n
    data\n
    data\n
    Each line number is ID of its data.

    In the text file:
    <name:Jack><fname:Stevens><s:male><healthy:yes>
    <name:Sia><fname:Kc><s:male><healthy:?>
    <name:Nila><fname:hikards><s:female><healthy:yes >
    <name:Loony><fname:Toon><s:male><healthy:no>
    ID of Nila is 4 because she is in line 4. A simple parser can get the data from related fields.

    http://cboard.cprogramming.com/showthread.php?t=81111
    Last edited by siavoshkc; 07-19-2006 at 01:30 PM.
    Learn C++ (C++ Books, C Books, FAQ, Forum Search)
    Code painter latest version on sourceforge DOWNLOAD NOW!
    Download FSB Data Integrity Tester.
    Siavosh K C

  11. #11
    Anti-Poster
    Join Date
    Feb 2002
    Posts
    1,401
    That can work, but then you have to iterate through each entry in order to get to the 100,000th one. That's 99,999 wasted reads. Fixed-length entries let you seek straight to where you want to access. It's the difference in member access between a linked list and an array.
    If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein

  12. #12
    System Novice siavoshkc's Avatar
    Join Date
    Jan 2006
    Location
    Tehran
    Posts
    1,246
    In case we can't have fixed size enteries:
    Right me if I am wrong.
    When you have for example 12 text files in a folder named f1, f2 ... f12 and you want to access f6 for example. OS will iterate through file names in the folder until it finds f6.
    So whether you have 12 text files or 12 enteries in one text file, you need to iterate through each entry or file to find the needed data. You have to iterate in both cases.
    Learn C++ (C++ Books, C Books, FAQ, Forum Search)
    Code painter latest version on sourceforge DOWNLOAD NOW!
    Download FSB Data Integrity Tester.
    Siavosh K C

  13. #13
    and the Hat of Clumsiness GanglyLamb's Avatar
    Join Date
    Oct 2002
    Location
    between photons and phonons
    Posts
    1,110
    Quote Originally Posted by siavoshkc
    OS will iterate through file names in the folder until it finds f6.
    Depends on how the OS handles the seeking of a file. Normally it would just seek in a FAT ( file allocation table ) to see where the first block of this file is located. Depending on which file format etc etc it goes on from there.

    For instance the first block could hold the value of the next block of the file. And so on. Of course this is quite fragile, since you only need one block to go corrupt and lose the whole file since you never know which block would be the next.

    Quote Originally Posted by siavoshkc
    So whether you have 12 text files or 12 enteries in one text file, you need to iterate through each entry or file to find the needed data. You have to iterate in both cases.
    Again depens but probably no.

    Also what would be the fastest?

    12 times :
    Move the head of your hard disk , copy data to RAM , copy to cache , copy to next cache and start operations on data.

    1 time:
    Move the head of your hard disk, copy data to ram.
    12 times:
    seek in ram, copy to all caches.

    Of course there is the chance a page swap happens in memory while the whole file is loaded but normally the OS should have a decent algorithm to prevent swapping a page that is being used at the time or will be used in the near future.

    Accessing something in memory is so much faster, why do you think we have RAM ?

    We could do without it, forget cache etc as well . Just have a hard drive with a space allocated to copy data to when loading programs and manipulating data. That is if you want to wait and have your cpu sit and wait for like 99% of the time while the hard drive is copying data , seeking data etc...

  14. #14
    pwns nooblars
    Join Date
    Oct 2005
    Location
    Portland, Or
    Posts
    1,094
    Ok, here is my take on how to do this, from a C++ perspective.

    Use character arrays, then write to a file in binary mode. Then to read you can take the step number you want to see.. multiply it by the size of one step (the combined size of all the data written in one step) and poof you are there ready to read your step.

    I have looked it up, and it seems there are structs in C#, I would recommend using that to keep all your data to be written in one place, then just write/read the structs from the file.

  15. #15
    Anti-Poster
    Join Date
    Feb 2002
    Posts
    1,401
    Quote Originally Posted by Wraithan
    it seems there are structs in C#
    Ha...a truer statement than you may know. I'm not a real big fan of C# structs. I appreciate the value-type semantics, but they are a real pain to work with. Every time I think, "Man, that would be a real natural place for a struct," it usually turns back into a class while finishing the design.
    If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. No clue how to make a code to solve problems!
    By ctnzn in forum C Programming
    Replies: 8
    Last Post: 10-16-2008, 02:59 AM
  2. C Pointers Problems
    By mhelal in forum C Programming
    Replies: 8
    Last Post: 01-10-2007, 06:35 AM
  3. String Manipulation problems -_-
    By Astra in forum C Programming
    Replies: 5
    Last Post: 12-13-2006, 05:48 PM
  4. Rendering problems (DirectX?)
    By OnionKnight in forum Tech Board
    Replies: 0
    Last Post: 08-17-2006, 12:17 PM
  5. contest problems on my site
    By DavidP in forum Contests Board
    Replies: 4
    Last Post: 01-10-2004, 09:19 PM