Thread: speeding up file move/rename

  1. #1
    Registered User
    Join Date
    Jul 2007
    Posts
    7

    speeding up file move/rename

    Hello, I'm writing a small application that moves lots of files (15.000 to 25.000) from one directory to another (in the same filesystem).

    I'm using ext3 as the filesystem, and the application is written in c, I'm using the "rename" function to move each file inside a loop to it's destination.

    I would like to know if there is any other faster way to do it and if anyone knows if reiserfs or other filesystems might be faster on moving files.

    Thank's.

  2. #2
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by marc252 View Post
    I would like to know if there is any other faster way to do it
    rename() is as fast as it gets.

    and if anyone knows if reiserfs or other filesystems might be faster on moving files.
    That totally blows me away. In order to save a few seconds, you want to spend hours converting the filesystem to some other format?

  3. #3
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    How about rearranging the problem so you only have to rename the ONE directory containing all the files rather than the x,000's of files within the directory.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  4. #4
    Registered User
    Join Date
    Jul 2007
    Posts
    7
    That totally blows me away. In order to save a few seconds, you want to spend hours converting the filesystem to some other format?
    Not really, I can choose the filesystem before even installing the application.
    This application is used to classify files comming from an FTP server, I'm getting about 25/50 files per second and I wan't to run this app scheduled every 5 minutes, the problem comes when the app takes more than 5 minutes processing this files :-)

    Thank's anyway.

    Marc.

  5. #5
    Registered User
    Join Date
    Jul 2007
    Posts
    7
    Quote Originally Posted by Salem View Post
    How about rearranging the problem so you only have to rename the ONE directory containing all the files rather than the x,000's of files within the directory.
    Good point, but unfortunately I can't do that because I have mixed files that have to go to different directories :-(

  6. #6
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by marc252 View Post
    Not really, I can choose the filesystem before even installing the application.
    This application is used to classify files comming from an FTP server, I'm getting about 25/50 files per second and I wan't to run this app scheduled every 5 minutes, the problem comes when the app takes more than 5 minutes processing this files :-)
    I'm not sure I understand how your program can't move 50 files per second. Is this problem hypothetical or have you observed it? If you're having trouble moving 50 files per second there is something else wrong.

  7. #7
    Registered User
    Join Date
    Jul 2007
    Posts
    7
    Quote Originally Posted by brewbuck View Post
    I'm not sure I understand how your program can't move 50 files per second. Is this problem hypothetical or have you observed it? If you're having trouble moving 50 files per second there is something else wrong.
    Yes, I have observed the problem, and mainly the problem is that other applications are running as well at the same time and fighting for processor time. I can move 50 files per second but I would like to be able to be much faster and less processor hungry.

  8. #8
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by marc252 View Post
    Yes, I have observed the problem, and mainly the problem is that other applications are running as well at the same time and fighting for processor time. I can move 50 files per second but I would like to be able to be much faster and less processor hungry.
    So is this a dedicated box or not? You said you could select any FS you wanted, which makes me think it's dedicated to your purpose, and yet you say there are other apps running? Can you clarify?

    If the box is overloaded there's not much you can do about it.

  9. #9
    Registered User
    Join Date
    Jul 2007
    Posts
    7
    Quote Originally Posted by brewbuck View Post
    So is this a dedicated box or not? You said you could select any FS you wanted, which makes me think it's dedicated to your purpose, and yet you say there are other apps running? Can you clarify?

    If the box is overloaded there's not much you can do about it.
    Yes, the box is dedicated, the other apps I'm running are mainly an FTP server and a apache server. The box is overloaded because of this app moving files, this is why I wan't to find other solutions.

    Thank's

  10. #10
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    You are actually using the rename() function, and not doing something hacky like system("mv oldfile newfile");

    50 files/sec via FTP sounds like you're getting a lot of small files. My counter-intuitive suggestion (I can't believe this could possibly be better, but hey) is that you could 'tar' all the files to be moved to a given directory into a single tar file, then untar them all in the new place. Then delete all the original files and the tar file. You can also run two tar commands via a pipe if you wanted to.

    It seems odd that since just creating a file in the first place is half the work of rename (a link is created, compared to rename which is create new link, remove old link), that rename should have such poor performance.

    Perhaps your FS would benefit from different mount options
    http://www.linuxmanpages.com/man8/mount.8.php
    There are some options which cause less work for things like rename()
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  11. #11
    Registered User
    Join Date
    Jul 2007
    Posts
    7
    Thank's a lot, I didn't think about tar/untar, I'll give it a shot.
    Also mount options might be what I was looking for.

    Thanks.

  12. #12
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by marc252 View Post
    Yes, the box is dedicated, the other apps I'm running are mainly an FTP server and a apache server. The box is overloaded because of this app moving files, this is why I wan't to find other solutions.

    Thank's
    A box with nothing but an FTP and web server running on it should be able to move 50 files per second. There is something else going on here...

  13. #13
    Woof, woof! zacs7's Avatar
    Join Date
    Mar 2007
    Location
    Australia
    Posts
    3,459
    Why don't you move the files as you get them? Rather than moving them after a certain amount of time?

  14. #14
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,895
    Also, have you considered making the temporary storage for incoming files a RAM disk?
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  15. #15
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    Mmmmm.
    http://www.redhat.com/archives/rhl-l.../msg03266.html
    http://linuxgazette.net/102/piszcz.html

    If what they're saying is true, then 25K files in a list is going to be really expensive when it comes to removing a single file from the directory file list. I don't know if you can guess the order, but you might be able to manipulate the order in which you delete files in your favour. Also, blowing away the entire directory in one hit may be more efficient than removing each file individually.

    zacs7's idea looks a lot better IMO. By processing the files when there are fewer in the directory, you minimise the amount of extra work in manipulating directory file lists.

    I would suggest further research on "benchmarking filesystems".
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Encryption program
    By zeiffelz in forum C Programming
    Replies: 1
    Last Post: 06-15-2005, 03:39 AM
  2. archive format
    By Nor in forum A Brief History of Cprogramming.com
    Replies: 0
    Last Post: 08-05-2003, 07:01 PM
  3. Making a LIB file from a DEF file for a DLL
    By JMPACS in forum C++ Programming
    Replies: 0
    Last Post: 08-02-2003, 08:19 PM
  4. Hmm....help me take a look at this: File Encryptor
    By heljy in forum C Programming
    Replies: 3
    Last Post: 03-23-2002, 10:57 AM
  5. Need a suggestion on a school project..
    By Screwz Luse in forum C Programming
    Replies: 5
    Last Post: 11-27-2001, 02:58 AM