Hi,
I don't understand why fseek() requires more time to access a heap binary file than fread(). Could anyone share some input on this?
Printable View
Hi,
I don't understand why fseek() requires more time to access a heap binary file than fread(). Could anyone share some input on this?
I'm not sure what exactly you're referring to, but I'll take a stab at it:
Files are stored on harddisks in chunks called sectors (typically 4KB each). To fseek() deep into the file, it requires the OS to retrieve the information about where the chunk of file you're looking for is located. That can require several disk accesses. fread() (depending on how much data you're reading) can usually just get the data from memory (the read-ahead buffer).
But really, fseek() and fread() do completely different things even though they both deal with files. It's like comparing apples and oranges.
Hmmm, because I'm comparing a sorted file and an unsorted(heap), I used fread() to read the heap file, which was faster than a sorted file. In this case, I think the best searching algorithm would be a linear search because of the data is presented in an unordered fashion.
On the other hand, a sorted file, a searching algorithm with fseek() could prove useful and theoratically fasterl. My main concern is that I'm wondering why my linear search on a heap is faster than a sorted file with a proper searching algorithm.
Memory is free, so why not do bulk reads from the file then search in memory? If you have a buffer large enough to store the entire file, you could call fread() one time to get the file, then you could preform a number of sorting/searching (canned) calls to find your data -- Assuming that you are able to read the data from the disc as fast as you say you can.
You could use an insertion sort if you're concerned about memory usage. You don't need two copies of your data at the same time in memory.
Yes. I could do that, which saves alot of time. I'm just wondering why is my linear searching algorithm on heap (I used only fread() to read certain chunks of data) is faster than a searching algorithm on a sorted file (I used a Binary Search with fseek() to locate the data).Quote:
Memory is free, so why not do bulk reads from the file then search in memory? If you have a buffer large enough to store the entire file, you could call fread() one time to get the file, then you could preform a number of sorting/searching (canned) calls to find your data -- Assuming that you are able to read the data from the disc as fast as you say you can.
As far as I know, a sorted file as input is always faster than an unsorted file.
It's because file I/O is so slow. It's so slow that a linear search in memory is faster than a binary search on the disk. That's why, unless your file is truly gigantic, you probably want to read it into memory and do a search on it there.Quote:
Yes. I could do that, which saves alot of time. I'm just wondering why is my linear searching algorithm on heap (I used only fread() to read certain chunks of data) is faster than a searching algorithm on a sorted file (I used a Binary Search with fseek() to locate the data).
As far as I know, a sorted file as input is always faster than an unsorted file.
Thanks for the input guys! I think it enlightened me a little about the time cost of fseek() and fread().