Thread: Performance issue!

  1. #16
    Registered User
    Join Date
    Dec 2005
    Posts
    136
    All,

    Thanks a lot for your valuable suggestion.
    i am using open call with O_RDWR mode. and using lseek to locate the offset.

    According to the behavior of process i feel that problem is file system caching/paging. The first time the program is run it takes far longer to execute than any subsequent runs.
    Is there any way to by-pass the file system caching, or something like direct I/O for redhat linux machine??

    Files that are frequently read are under the same directory.

    My data is in very simple format.
    for example.
    2334312,543,MYNAME,NYC,20090302
    here 2334312 is the key value which is unique. like so data file contains around 5 millions of data and index file contains index information for these 5 million data.

    Note: Currently I am performing only read on this file and only one process is accessing it.
    S_ccess is waiting for u. Go Ahead, put u there.

  2. #17
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by maven View Post
    According to the behavior of process i feel that problem is file system caching/paging. The first time the program is run it takes far longer to execute than any subsequent runs.
    Is there any way to by-pass the file system caching, or something like direct I/O for redhat linux machine??
    This seems like a strange theory. First you say that it seems faster once it's cached, then you ask how to disable caching? (The reason it's fast after the first run is because it's been cached)

    To prove/disprove my idea, you can disable caching on the file by opening it in uncached mode using O_DIRECT.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  3. #18
    Registered User
    Join Date
    Dec 2005
    Posts
    136
    But it is not always in cache. As i have said it earlier that i am seeing high I/O.

    I have heared that not all that many platforms support Direct I/O. So it will be of no use by forcing it to open with O_DIRECT mode. I am using processor AMD Opteron(tm) Processor 246.
    I have tried it to open with O_DIRECT mode. but perfomance was the same.
    S_ccess is waiting for u. Go Ahead, put u there.

  4. #19
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Clearly, you *want* the data indexed - definitely!

    "Perspective" posted up a fact that should be explored further:

    HD's are *very* fast for sequential reading, and *very* slow for random access reading. Which brings up the idea of just adding a HD controller with a large amount of RAM, to extend the amount of data that the HD has "at it's fingertips", and can access without any further disk reading.

    I recall the distinct pleasure of installing one in my PC many years ago - suddenly my PC was on steroids for disk I/O!

    If time is money, I'd say this could be your most cost-effective and quickest solution.

    Next is the search. I mentioned previously that a binary search was not the fastest, even in RAM. On a HD, it's even slower, because of the distances the read head has to travel.

    Here's some more details:

    I took 3,000 records into an array, and used two searching techniques, binary and smart indexing. Then I did 100,000 lookup's of those records, randomly chosen, and timed how long it took each type of search to finish the 100k lookups.

    Smart index was 5 X faster than binary searching.

    It worked like this:

    The database was indexed so the middle record ID of each part was then used as the "starting point" for any search in it's own section. (sort of like the mid point of a binary search).

    Now the closest index ID start point was found, in your case, the right offset to the best starting point, would be known. Starting with that indexed record, the program would search up or down, sequentially, until it found it's proper record.

    I suppose you're thinking "That can't be faster than binary search". That's what I thought, as well. But it is, and it can be fine-tuned to add even better performance, if you program in more "granularity" into the indexing part of it.

    It can make MORE comparisons than a binary search, on average, but those comparisons are with records being read in *sequentially* from the HD, (and thus probably already in the buffer), rather than the schizoid HD reads needed by the binary search algorithm.

    Even in an array, the indexed search was 500% faster. You couldn't tell it by just one search though - they were just too fast. But give it 100,000 searches, and the difference was immediately noticeable.

  5. #20
    Registered User
    Join Date
    Dec 2005
    Posts
    136
    Definately i will try to change my secrah technique to Smart index . Can you please give some links for some help on Smart index?

    One thing i have tested and found a huge difference in lookup performance.
    i have mouted the directory as tmpfs(It is a temporary ram based file system). And it showed consistent performance with 10 X speed.
    S_ccess is waiting for u. Go Ahead, put u there.

  6. #21
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    How much memory does the machine have? If you have more than 1GB of RAM, then you should be able to simply load the entire index table into memory, and use it from there [or at least use a mechanism of storing the last 2-300 entries. Since your binary search will jump all over the place, but some elements will be COMMON in that list, keeping those in RAM will speed things up A LOT.

    Actually, the fact that tmpfs works for you means that you have enough memory to load all the data into RAM - you may not need to load it ALL at once, but basically, once it's been read in, keep it in memory. It should speed things up (probably MORE than the 10x you get from tmpfs, since you do not need to copy the data as many times as the read() system call would do).

    The fact that O_DIRECT is showing no difference tells me that caching is not working for you (but also not hindering anything).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #22
    Registered User
    Join Date
    Dec 2005
    Posts
    136
    how to do "simply load the entire index table into memory". and will it gurantees that if some other memory consuming process will come than index table will not be swapped out?
    S_ccess is waiting for u. Go Ahead, put u there.

  8. #23
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by maven View Post
    how to do "simply load the entire index table into memory". and will it gurantees that if some other memory consuming process will come than index table will not be swapped out?
    Are you REALLY not having enough memory? Your tmpfs test seems to suggest that you DO have enough memory.

    But of course, if you intend to run MANY copies of the application, you may need to use some more convoluted method (e.g. using shared memory, mmap() or some such).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  9. #24
    Registered User
    Join Date
    Dec 2005
    Posts
    136
    Yes system has enough memory, i have stopped some meory consuming processes for the time being thats why for the time i have this much memory.

    One thing i am not able to understand if program is able to work great with tmpfs then why it is not giving performance when other process are stopped and system has enough memory? And while working with tmpfs I/O wait is negligible where as without tmpfs it is 98% average.
    S_ccess is waiting for u. Go Ahead, put u there.

  10. #25
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by maven View Post
    Yes system has enough memory, i have stopped some meory consuming processes for the time being thats why for the time i have this much memory.

    One thing i am not able to understand if program is able to work great with tmpfs then why it is not giving performance when other process are stopped and system has enough memory? And while working with tmpfs I/O wait is negligible where as without tmpfs it is 98% average.
    Ok, let's get one thing very clear: If you do not have enough memory SOMETHING has to be swapped to disk at some point.

    But if you are frequently accessing your index table, then it should not be what gets swapped out. Some other memory consuming process that isn't very active will get thrown out [most likely].

    Second, I'm pretty sure you hit the same records A LOT in your binary search of the index. So caching THOSE entries would help a whole lot - knowing which they are is the hard part if the data involved [either what you search for, or what you search in] changes.

    One thing to experiment with would be to put only the index in tmpfs and keep the data file itself on hard-disk. Does that give a 10% or 50% or 8x speed up?
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  11. #26
    Registered User
    Join Date
    Dec 2005
    Posts
    136
    I have experimented it. And it gave good perfomance as well. As said 8x.

    what we can conclude now?
    S_ccess is waiting for u. Go Ahead, put u there.

  12. #27
    Registered User
    Join Date
    Oct 2008
    Location
    TX
    Posts
    2,059
    Keep frequently accessed meta-data on the VM based tmpfs while keeping the data itself on the disk device; see here for details.

  13. #28
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by maven View Post
    I have experimented it. And it gave good perfomance as well. As said 8x.

    what we can conclude now?
    1) static RAM is many times faster than getting data from a HD. Same with HD buffer and HD controller RAM. They fly. If you're not using a HD controller with a large buffer, you might want to look into that.

    2) For any data in RAM that changes, be sure to keep your data updated frequently, on the HD. Power outages and such, will occur. A battery back up for the system, might be wise. Give you time to save the current updated data and shut down, at least.

    3) Sequential reads are much faster than random reads on a HD. This doesn't hold for SSD drives, because they're basically RAM.

    My old smart index program bit the dust when the disk was seriously infected despite a good AV program and firewall, so I'm working on a new program to show how a smart indexer can blow the doors off a binary search.

    Cheers for now.

  14. #29
    Registered User
    Join Date
    Dec 2005
    Posts
    136
    Can you give some links where i can find the concept of smart index?
    S_ccess is waiting for u. Go Ahead, put u there.

  15. #30
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by maven View Post
    Can you give some links where i can find the concept of smart index?
    No, I can't. I was in the Yahoo C programming group, and mentioned how much I liked binary search, and the mod said "Not as fast in this case". Then he starts a little BigO(n), and O(log n) stuff and my eyes just started glazing over.

    So I challenged him to a test - and his way (smart indexing) was indeed quite a bit faster, but you couldn't tell it from a small test, you had to loop searches thousands of time before any run time could be found, at all.

    In the final test I made with (iirc), a 100,000 searches, smart indexing was anywhere from 200% to 700% percent faster, depending on the specifics.

    I spent some time googling for it, but I couldn't find anything exactly on it.

    As mentioned above, my test program was lost in a horrendous virus attack on that system.

    I remember the gist of the program, and have prepared an even better test for it. A file of 500 records, for the disk has been made up. The program automatically builds the indexes, and has the binary search function done, as well.

    It will roughly be done later today, but I may need another day to get some testing done.

    I want it to be a fair test of the two algorithms.

    See you then.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. float calculation issue
    By George2 in forum C# Programming
    Replies: 1
    Last Post: 05-26-2008, 04:56 AM
  2. Performance and footprint of virtual function
    By George2 in forum C++ Programming
    Replies: 8
    Last Post: 01-31-2008, 07:34 PM
  3. File map performance
    By George2 in forum C++ Programming
    Replies: 8
    Last Post: 01-04-2008, 04:18 AM
  4. Binary Search Trees Part III
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 10-02-2004, 03:00 PM
  5. inheritance and performance
    By kuhnmi in forum C++ Programming
    Replies: 5
    Last Post: 08-04-2004, 12:46 PM