PDA

View Full Version : speeding up file move/rename



marc252
07-09-2007, 10:50 AM
Hello, I'm writing a small application that moves lots of files (15.000 to 25.000) from one directory to another (in the same filesystem).

I'm using ext3 as the filesystem, and the application is written in c, I'm using the "rename" function to move each file inside a loop to it's destination.

I would like to know if there is any other faster way to do it and if anyone knows if reiserfs or other filesystems might be faster on moving files.

Thank's.

brewbuck
07-09-2007, 11:10 AM
I would like to know if there is any other faster way to do it

rename() is as fast as it gets.


and if anyone knows if reiserfs or other filesystems might be faster on moving files.

That totally blows me away. In order to save a few seconds, you want to spend hours converting the filesystem to some other format?

Salem
07-09-2007, 11:14 AM
How about rearranging the problem so you only have to rename the ONE directory containing all the files rather than the x,000's of files within the directory.

marc252
07-09-2007, 11:16 AM
That totally blows me away. In order to save a few seconds, you want to spend hours converting the filesystem to some other format?
Not really, I can choose the filesystem before even installing the application.
This application is used to classify files comming from an FTP server, I'm getting about 25/50 files per second and I wan't to run this app scheduled every 5 minutes, the problem comes when the app takes more than 5 minutes processing this files :-)

Thank's anyway.

Marc.

marc252
07-09-2007, 11:21 AM
How about rearranging the problem so you only have to rename the ONE directory containing all the files rather than the x,000's of files within the directory.

Good point, but unfortunately I can't do that because I have mixed files that have to go to different directories :-(

brewbuck
07-09-2007, 11:37 AM
Not really, I can choose the filesystem before even installing the application.
This application is used to classify files comming from an FTP server, I'm getting about 25/50 files per second and I wan't to run this app scheduled every 5 minutes, the problem comes when the app takes more than 5 minutes processing this files :-)

I'm not sure I understand how your program can't move 50 files per second. Is this problem hypothetical or have you observed it? If you're having trouble moving 50 files per second there is something else wrong.

marc252
07-09-2007, 11:40 AM
I'm not sure I understand how your program can't move 50 files per second. Is this problem hypothetical or have you observed it? If you're having trouble moving 50 files per second there is something else wrong.

Yes, I have observed the problem, and mainly the problem is that other applications are running as well at the same time and fighting for processor time. I can move 50 files per second but I would like to be able to be much faster and less processor hungry.

brewbuck
07-09-2007, 11:48 AM
Yes, I have observed the problem, and mainly the problem is that other applications are running as well at the same time and fighting for processor time. I can move 50 files per second but I would like to be able to be much faster and less processor hungry.

So is this a dedicated box or not? You said you could select any FS you wanted, which makes me think it's dedicated to your purpose, and yet you say there are other apps running? Can you clarify?

If the box is overloaded there's not much you can do about it.

marc252
07-09-2007, 12:10 PM
So is this a dedicated box or not? You said you could select any FS you wanted, which makes me think it's dedicated to your purpose, and yet you say there are other apps running? Can you clarify?

If the box is overloaded there's not much you can do about it.

Yes, the box is dedicated, the other apps I'm running are mainly an FTP server and a apache server. The box is overloaded because of this app moving files, this is why I wan't to find other solutions.

Thank's

Salem
07-09-2007, 12:50 PM
You are actually using the rename() function, and not doing something hacky like system("mv oldfile newfile");

50 files/sec via FTP sounds like you're getting a lot of small files. My counter-intuitive suggestion (I can't believe this could possibly be better, but hey) is that you could 'tar' all the files to be moved to a given directory into a single tar file, then untar them all in the new place. Then delete all the original files and the tar file. You can also run two tar commands via a pipe if you wanted to.

It seems odd that since just creating a file in the first place is half the work of rename (a link is created, compared to rename which is create new link, remove old link), that rename should have such poor performance.

Perhaps your FS would benefit from different mount options
http://www.linuxmanpages.com/man8/mount.8.php
There are some options which cause less work for things like rename()

marc252
07-09-2007, 01:07 PM
Thank's a lot, I didn't think about tar/untar, I'll give it a shot.
Also mount options might be what I was looking for.

Thanks.

brewbuck
07-09-2007, 01:13 PM
Yes, the box is dedicated, the other apps I'm running are mainly an FTP server and a apache server. The box is overloaded because of this app moving files, this is why I wan't to find other solutions.

Thank's

A box with nothing but an FTP and web server running on it should be able to move 50 files per second. There is something else going on here...

zacs7
07-09-2007, 05:26 PM
Why don't you move the files as you get them? Rather than moving them after a certain amount of time?

CornedBee
07-10-2007, 03:23 AM
Also, have you considered making the temporary storage for incoming files a RAM disk?

Salem
07-10-2007, 04:28 AM
Mmmmm.
http://www.redhat.com/archives/rhl-list/2005-July/msg03266.html
http://linuxgazette.net/102/piszcz.html

If what they're saying is true, then 25K files in a list is going to be really expensive when it comes to removing a single file from the directory file list. I don't know if you can guess the order, but you might be able to manipulate the order in which you delete files in your favour. Also, blowing away the entire directory in one hit may be more efficient than removing each file individually.

zacs7's idea looks a lot better IMO. By processing the files when there are fewer in the directory, you minimise the amount of extra work in manipulating directory file lists.

I would suggest further research on "benchmarking filesystems".

Kennedy
07-12-2007, 09:59 AM
If you are having problems with your program not being on the processor enough, why not use renice and set your priority to a lower value?

brewbuck
07-12-2007, 10:29 AM
If you are having problems with your program not being on the processor enough, why not use renice and set your priority to a lower value?

I'm not sure if that would have any useful effect on this kind of program. All it's doing is moving files. Seems like a highly I/O bound operation (depending on the filesystem obviously).

I think part of the problem could be that the FTP server is receiving (as he said) somewhere up to 50 files per second. So those have to be written to disk at the same time the files are being moved. That might lead to some disk thrashing in itself.

If the problem is disk thrashing, one thing to try is just adding some more RAM. This will get you a bigger block cache, hopefully reducing the thrashing. Or, if the files are small enough, have the FTP server stick them on a ramdisk instead of a real filesystem, and have the moving process copy them to hard storage (while organizing them however it wants to).

marc252
07-18-2007, 02:14 PM
Hello, after a few days of testing, what i've experienced seems to be exactly what Salem suggested in his post:


Mmmmm.
http://www.redhat.com/archives/rhl-l.../msg03266.html
http://linuxgazette.net/102/piszcz.html

If what they're saying is true, then 25K files in a list is going to be really expensive when it comes to removing a single file from the directory file list. I don't know if you can guess the order, but you might be able to manipulate the order in which you delete files in your favour. Also, blowing away the entire directory in one hit may be more efficient than removing each file individually.

zacs7's idea looks a lot better IMO. By processing the files when there are fewer in the directory, you minimise the amount of extra work in manipulating directory file lists.

I would suggest further research on "benchmarking filesystems".

The whole problem is that rename speed is not linear but quadratic on the number of files in the directory thus moving 1000fps is easy for less than 5000 files, but when I get to 50.000 it easily drop to 500 fps, things get nastier when over 100.000 files.

All this tests were made in a ext3 filesystem, next thing I'm going to try is benchmarking with reiserfs and XFS.

Thank's for all your great suggestions!!!

Marc.