Thread: File Splitter Program Help

  1. #1
    Registered User
    Join Date
    Oct 2010
    Posts
    2

    File Splitter Program Help

    Hello,
    I'm writing a file splitting program which takes in the command line parameters of the name of the file to be split and into how many pieces. I am having difficulty with a particular part of the program where it is supposed to create a new name for a file. I apologize ahead of time for posting the whole program, I would also like if anyone has any other advice, tips, or tricks that would make my coding more efficient and clean.

    Code:
    #include "stdio.h"
    #include "stdlib.h"
    
    int main (int argc, char *argv[])
    {
    
    FILE *fs;
    int c;
    int numbofchar=0;                //inputing values into variables
    
    int numofp = atoi(argv[2]);
    fs = fopen(argv[1], "rb");       //opening file pointer
    if(fs==NULL)
    {
      printf("Didnt work");          //returns null if not found
    }
    else
    {
    
    int soch = sizeof(char);
    
    while(1)
    {
    c = fgetc(fs);
    
    if(c == EOF)                     //finding size of file will use when include function to split by size
    {
    break;
    }
    else
    {
    numbofchar++;
    }
    }
    rewind(fs);
    int answer = numbofchar/(sizeof(char));    //stores size of file
    printf("size of file: %d\n", answer);    
    int number = 0;
    printf("created number\n");
    
    printf("ran strlen\n");
    int j=0;
    int charcount;
    int i;                       
    printf("numbofchar: %d, nubofp: %d\n", numbofchar, numofp);
    int sizeofp = ceil(numbofchar/numofp);                         //determinig size of pieces
    printf("ran sizeofp: %d\n",sizeofp);
    int iterations = 0;
    
    char argvcopy[strlen(argv[1])];
    strcpy(argvcopy,argv[1]);           //creating a copy of argv[1]
    
    printf("argvcopy: %s\n",argvcopy);
    int lenarg1 = strlen(argv[1]);
    int lenarg2 = strlen(argv[2]);
    int lenfin = lenarg1 + lenarg2;
    printf("lenfin %d\n", lenfin);
    char name[lenfin]; //creating a string which holds final file name
    
    char filetype[4]; //creating a string which holds just extension
    
    j=(strlen(argvcopy)-4); //storing value of place where filename ends but before extension
    int ab = 0;
    printf("strlen of argvcopy: %d   j: %d\n",strlen(argvcopy),j);
    while(ab<5)
    {
    
            filetype[ab] = argvcopy[j];         //creating file extension string
            printf("fileytpe[ab]: %c\n" , filetype[ab]);
            ab++;
            printf("ab: %d , j: %d\n", ab, j);
            j++;
            printf("ab: %d , j: %d\n", ab, j);
    }
    printf("ab: %d   j: %d\n", ab,j);
    printf("file type: %s\n",filetype);
    
    char size[4];                      //creat string for part number in file name
    FILE *fd; 
    int end_of_file=0;
    for(i = 0; i<numofp; i++)
    {
    c = 0;
    charcount = 0;                             //this will set character counter back to zero
    number++;
    printf("number: %i\n", number);
    j=0;
    /* here is when i dynamically make names of the files*/
    end_of_file = strlen(argvcopy);
    end_of_file= endof - 4;            //determining where file extension starts and making sure following loop only copies file name
    printf("end of file: %d\n", end_of_);
    while(j<end_of_file)
    {
    
         name[j] = argvcopy[j];
         printf("character at %i: %c\n", j,argvcopy[j]);       
         j++;
         printf("iteration: %i\n" , j);
    }
    
    printf("name: %s\n", name);
    
    itoa(number,size,10);              //converting part number to string
    printf("size: %s\n",size);
    ab=0;
    while(j<strlen(size))
    {
                 name[j]=size[ab];
                 j++;                               //adding part number to name
                 ab++;
                 }
    ab = 0;
    while(j<strlen(filetype))
    {
                             name[j]=filetype[ab];
                             j++;                            //adding file extension
                             ab++;
    }
    
    printf("sizeofp = %d\n", sizeofp);
    printf("charcount = %d\n", charcount);
    printf("name:  %s\n", name);
    fd = fopen(name, "wb");
    if(fs==NULL)
    {
      printf("Didnt work-couldnt create file");          //returns null if not found
    }
    printf("ran fd open\n");
    
    while(1)
    {
            iterations++;
            if(c == EOF)
            {
                 fclose(fd);
                 printf("eof iteration: %d", iterations);
                break;
                }
            if(charcount<sizeofp)
            {
                                 printf("write iteration: %d", iterations);
                            c = fgetc(fs);
                            printf("got character");
                            fputc(c,fd);
                            printf("wrote character");
                            charcount++;                 
                            }
            else
            {
                printf("close iteration: %d", iterations);
                fclose(fd);
                break;
                }
            
    }
    fclose(fs);
    }
    }
    }
    Again, I apologize for posting all of the code, I really would like an honest critique as I know it isn't pretty. Thank you for you time.

  2. #2
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Welcome to the forum, Kolbe!

    Your code isn't too long, so posting it is fine.

    You won't be able to use your good "eye" for finding bugs, with that style of formatting, however. As you practice with good style indentations, your eyes will become trained to find bugs before you brain can every hope to. This:
    Code:
    while(1)
    {
            iterations++;
            if(c == EOF)
            {
                 fclose(fd);
                 printf("eof iteration: %d", iterations);
                break;
                }
            if(charcount<sizeofp)
            {
                                 printf("write iteration: %d", iterations);
                            c = fgetc(fs);
                            printf("got character");
                            fputc(c,fd);
                            printf("wrote character");
                            charcount++;                 
                            }
            else
            {
                printf("close iteration: %d", iterations);
                fclose(fd);
                break;
                }
            
    }
    fclose(fs);
    }
    }
    Is a very bad style. Compare it to this:
    Code:
          while(1)
          {
             iterations++;
             if(c == EOF)
             { 
                fclose(fd);
                printf("eof iteration: %d", iterations);
                break;
             }
             if(charcount<sizeofp)
             {
                printf("write iteration: %d", iterations);
                c = fgetc(fs);
                printf("got character");
                fputc(c,fd);
                printf("wrote character");
                charcount++;                 
             }
             else
             {
                printf("close iteration: %d", iterations);
                fclose(fd);
                break;
             }
          }
          fclose(fs);
       }
    }
    When subordinate lines of code are indented 2-5 spaces (better than tabs), then they look better on the forum, you eye will spot many errors quickly (with training), and it becomes more clear what the code is written to do.

    For numbering files, I have a prefix filename string, and then strcat() an integer onto the end, which I've made into a string with itoa(). Then add the suffix (the period and extension char's, if any), also with strcat(), and it's done.

    If the file splits will be less than 10, I simply increment the char at the end of the filename prefix:
    Code:
    char filename[]={"number0.txt"};
    filename[6]++;
    New filename is number1.txt.
    Last edited by Adak; 10-03-2010 at 09:56 PM.

  3. #3
    Registered User
    Join Date
    Aug 2010
    Location
    Rochester, NY
    Posts
    196
    I apologize, I didn't read any of the code, but if you're having trouble coming up with the names, consider this:

    You have an integer value of the number of parts to split the file into, yes?

    If that's the case, then make a temporary buffer for the file name, such as:
    Code:
    char file_output_name[15];
    From there, you can use sprintf (I believe it's found in the stdio package) to get your number in there. So you can have something like.

    Code:
    void write_data_to_file(FILE*); // writes global buffer to a file or something
    
    for (int count = 0 ; count < num_files ; ++count)
    {
       sprintf(file_output_name, "file_part%d", count);
       FILE* fd = fopen(file_output_name, w+);
       write_data_to_file(fd); // write to file here, passing it the file descriptor - or however you would do the file writing.
       fclose(fd);
    }
    That will give you:
    Code:
    file_part0
    file_part1
    file_part2
    file_part3
    file_part4
    Assuming there was 5 files, and each would contain whatever you had the data set to, that's pretty arbitrary, I was just showing you how you could use sprintf(). You probably wouldn't want to do it quite like that, but I think you get the idea. It's quite simple to get your file names that way .

    Hope that answered your question, I'm in a rush now so I couldn't really read through your code, I apologize.
    Last edited by Syndacate; 10-05-2010 at 03:38 PM. Reason: Modified buffer allocation to 15, instead of 10, so there is no confusion regarding a null terminating char in char arrays.

  4. #4
    Registered User
    Join Date
    Oct 2010
    Posts
    2

    Thank you!

    Thank you both for your great answers! I always love it when I see a problem solved a way I would have never thought of. Also, thank you Adak for your welcome.

  5. #5
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Quote Originally Posted by Syndacate View Post
    I apologize, I didn't read any of the code, but if you're having trouble coming up with the names, consider this:

    You have an integer value of the number of parts to split the file into, yes?

    If that's the case, then make a temporary buffer for the file name, such as:
    Code:
    char file_output_name[10];
    Code:
    void write_data_to_file(FILE*); // writes global buffer to a file or something
    
    for (int count = 0 ; count < num_files ; ++count)
    {
       sprintf(file_output_name, "file_part%d", count);
       FILE* fd = fopen(file_output_name, w+);
       write_data_to_file(fd); // write to file here, passing it the file descriptor - or however you would do the file writing.
       fclose(fd);
    }
    That will give you:
    Code:
    file_part0
    file_part1
    file_part2
    file_part3
    file_part4
    I wouldn't be so sure it gave you anything but a bug. Note that "file_part" is already 9 characters and that file_output_name is a 10 character array. Counting from zero we have 9 characters of space. Not only does this array not store a part number, it doesn't store the all-important zero that makes it a string. Using small arrays is a punishment unto oneself. Plus the second argument to fopen is also not a string, and fopen doesn't return a file descriptor... but maybe I should let that slide.
    Last edited by whiteflags; 10-05-2010 at 12:05 AM.

  6. #6
    Registered User
    Join Date
    Aug 2010
    Location
    Rochester, NY
    Posts
    196
    Quote Originally Posted by whiteflags View Post
    I wouldn't be so sure it gave you anything but a bug. Note that "file_part" is already 9 characters and that file_output_name is a 10 character array. Counting from zero we have 9 characters of space. Not only does this array not store a part number, it doesn't store the all-important zero that makes it a string. Using small arrays is a punishment unto oneself. Plus the second argument to fopen is also not a string, and fopen doesn't return a file descriptor... but maybe I should let that slide.
    Okay, so replace the "10" with a "15 - my apologies.

    Though if you want to get petty about it:
    My request for 10 bytes would probably allocate me 12 bytes, this is because most processors align to 4 byte boundaries, which 10 is not. Because of this, the trailing NULL byte of the const char array would most likely be copied into the buffer, though overrunning the buffer by 1 byte (assuming less than 10 parts), when used as a file name, it would stop at said NULL byte, regardless of the fact that it is 1 byte over the buffer.

    Also, due to 4 byte aligning, the next variable in the data segment of the program would be guaranteed in most systems to be at least 2 bytes away from the end of the buffer (10 % 4 = 2). So while it is indeed a bug in the "program", it would most likely not have any effect on the running of the program, although that is obviously not guaranteed.

    Yes, the buffer should have been 16 or something larger than 10, my mistake, I was in a rush, as stated. Regardless, I was simply showing the array declaration to tell the OP what the variable actually was, in order to give him a clearer understanding of the point I was making. I was not giving him code to copy/paste into his program.

    As for the last part, 0 (a trailing NULL byte) doesn't make it a string, it simply makes it easier for most functions to process the array. Most assemblers have assembly directives for both an ascii string in and of itself, as well as an asciiz string which contains a NULL trailing byte. That would be part of the fixed section..not the data section. A string is just a concept made up to make it fundamentally easier to understand.

  7. #7
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    As for the last part, 0 (a trailing NULL byte) doesn't make it a string, it simply makes it easier for most functions to process the array. Most assemblers have assembly directives for both an ascii string in and of itself, as well as an asciiz string which contains a NULL trailing byte. That would be part of the fixed section..not the data section. A string is just a concept made up to make it fundamentally easier to understand.
    While I accept that there are other ways to represent a string, they are irrelevant in C, as the standard must define the terms they use, and this includes "string".
    6.4.5.5 In translation phase 7, a byte or code of value zero is appended to each multibyte
    character sequence that results from a string literal or literals.66)

    66) A character string literal need not be a string (see 7.1.1), because a null character may be embedded in it by a \0 escape sequence.
    It further explains that narrow character strings (ones based on the C locale) are basically translated from their multibyte forms. So a string in C is a sequence of characters terminated by zero.

    The string literal "file_part" is indeed a string, but your array doesn't need to be, especially because it's so small.

    My request for 10 bytes would probably allocate me 12 bytes, this is because most processors align to 4 byte boundaries,
    Do you always add some bytes to every calculation you make for this purpose? That shouldn't always work as expected.

    I suppose I should apologize for offending you. That was not my intention but I wanted it to be clear that as written what you posted does not work and explain the reasons why; especially as it is a bug that may not appear consistently in execution, for the very reasons you mention. A lot of people read the forum, guests and members alike, and members are not above correcting each other.

  8. #8
    Registered User
    Join Date
    Aug 2010
    Location
    Rochester, NY
    Posts
    196
    Quote Originally Posted by whiteflags View Post
    While I accept that there are other ways to represent a string, they are irrelevant in C, as the standard must define the terms they use, and this includes "string".
    No, they are completely relevant in C. Remember, the stdio package, which typically relies on null terminated strings, is NOT part of C. It is simply a char array modification and analysis package which includes some sys calls.

    I can do:
    Code:
    char* buff[3];
    buff[0] = 'A';
    buff[1] = 'B';
    buff[2] = 'C';
    
    int size_of_buff = 3;
    And can do everything I do with NULL terminated strings and the stdio package, with standard C code. They are NOT irrelevant in C. That's a completely false statement. They are irrelevant and/or hazardous as far as the stdio package goes, 'tis is all. It's just an external library. Not part of the language.

    Quote Originally Posted by whiteflags View Post
    The string literal "file_part" is indeed a string, but your array doesn't need to be, especially because it's so small.
    I'm aware that the buffer was too small, I thought we established that.

    Quote Originally Posted by whiteflags View Post
    Do you always add some bytes to every calculation you make for this purpose? That shouldn't always work as expected.
    No, I was simply being as petty to you as you were being to me. You know damn well that it illustrated my point just fine without being a dick about it. If you are to be a dick about it, then I'll be a dick about it and say that most systems are 4 byte word addressable, therefore the issue wouldn't be shown in that example.

    Quote Originally Posted by whiteflags View Post
    I suppose I should apologize for offending you. That was not my intention but I wanted it to be clear that as written what you posted does not work and explain the reasons why; especially as it is a bug that may not appear consistently in execution, for the very reasons you mention. A lot of people read the forum, guests and members alike, and members are not above correcting each other.
    First, you're not sorry, please don't bull........ me. Secondly, although what I posted isn't guaranteed to work; due to system standards, it would most likely be OK to use the code I showed, although I would never say it's legit to overflow a buffer. I was NOT posting the code for him to copy/paste, I was simply showing him a concept, simple as that.

    I'll edit the post above to be an array of size 15. That should clarify any ambiguities regarding buffer overflows, so people like you will get off my back.

    I'm trying to help people by teaching concepts, not get dicked on details by people like you because it's not syntactically correct.

    The code I wrote, regardless of it being correct in terms of exact syntax, works on a C2D compiled with gcc4.2.1 with both static and dynamic allocation perfectly fine - just tested. That's the contrary to your "the only thing you'll get is a bug" statement.

    Again, I was showing a concept, not code for him to write down. I was simply trying to illustrate what the buffer variable was.
    Last edited by Syndacate; 10-05-2010 at 03:35 PM. Reason: Typo - don't want to get shot for that.

  9. #9
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    Quote Originally Posted by Syndacate View Post
    No, they are completely relevant in C. Remember, the stdio package, which typically relies on null terminated strings, is NOT part of C. It is simply a char array modification and analysis package which includes some sys calls.

    I can do:
    Code:
    char* buff[3];
    buff[0] = 'A';
    buff[1] = 'B';
    buff[2] = 'C';
    
    int size_of_buff = 3;
    And can do everything I do with NULL terminated strings and the stdio package, with standard C code. They are NOT irrelevant in C. That's a completely false statement. They are irrelevant and/or hazardous as far as the stdio package goes, 'tis is all. It's just an external library. Not part of the language.



    I'm aware that the buffer was too small, I thought we established that.



    No, I was simply being as petty to you as you were being to me. You know damn well that it illustrated my point just fine without being a dick about it. If you are to be a dick about it, then I'll be a dick about it and say that most systems are 4 byte word addressable, therefore the issue wouldn't be shown in that example.



    First, you're not sorry, please don't bull........ me. Secondly, although what I posted isn't guaranteed to work; due to system standards, it would most likely be OK to use the code I showed, although I would never say it's legit to overflow a buffer. I was NOT posting the code for him to copy/paste, I was simply showing him a concept, simple as that.

    I'll edit the post above to be an array of size 15. That should clarify any ambiguities regarding buffer overflows, so people like you will get off my back.

    I'm trying to help people by teaching concepts, not get dicked on details by people like you because it's not syntactically correct.

    The code I wrote, regardless of it being correct in terms of exact syntax, works on a C2D compiled with gcc4.2.1 with both static and dynamic allocation perfectly fine - just tested. That's the contrary to your "the only thing you'll get is a bug" statement.

    Again, I was showing a concept, not code for him to write down. I was simply trying to illustrate what the buffer variable was.
    You are not going to make it far in this forum with that type of attitude. You need to have thicker skin, and accept that fact that when you post incorrect code -- people will call you on it 100% of the time. Programming in general is a very tedious discipline, and it is not forgiving to those that don't pay attention to detail. In general, we attempt to make sure that the code that is posted is correct, and that's a big reason why this forum is so popular. I don't think whiteflags meant any offence by his comments, and I think that you probably overreacted to his criticisms.
    bit∙hub [bit-huhb] n. A source and destination for information.

  10. #10
    Lurking whiteflags's Avatar
    Join Date
    Apr 2006
    Location
    United States
    Posts
    9,612
    Quote Originally Posted by Syndacate View Post
    No, they are completely relevant in C. Remember, the stdio package, which typically relies on null terminated strings, is NOT part of C. It is simply a char array modification and analysis package which includes some sys calls.

    I can do:
    Code:
    char* buff[3];
    buff[0] = 'A';
    buff[1] = 'B';
    buff[2] = 'C';
    
    int size_of_buff = 3;
    And can do everything I do with NULL terminated strings and the stdio package, with standard C code. They are NOT irrelevant in C. That's a completely false statement. They are irrelevant and/or hazardous as far as the stdio package goes, 'tis is all. It's just an external library. Not part of the language.
    Where do you get the idea that the standard library is not part of C? It's been included in the standard for a very long time now. The standard library is defined to work on ASCIIZ strings. String literals are built into the language as ASCIIZ strings. If you're not using C strings, obviously my criticism does not make sense because the rules do not apply to your code, but that has nothing to do with the code you posted.

    No, I was simply being as petty to you as you were being to me. You know damn well that it illustrated my point just fine without being a dick about it. If you are to be a dick about it, then I'll be a dick about it and say that most systems are 4 byte word addressable, therefore the issue wouldn't be shown in that example.
    Admitting there is a problem and then trying to backpedal by saying oh look it works is not a dick move. You're just wasting time and nobody has to care. It doesn't make you more correct or something.

  11. #11
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    @Syndacate:

    Don't take it personally - it's not about ego or whatever - OK, well maybe a bit, but mostly it's just that what gets posted in here, we want to be right in detail, as well as in "spirit", unless you specifically note otherwise.

    How do I know that - I have no idea (compiler postfix increment)

    Having the details noticed by a forum like this, is a great way to learn or improve your code practice and understanding, and lots of people will be reading whatever is posted.

    Please, save your anger for Fred Phelps and his looney-tunes congregation. Best evidence yet that there IS un-intelligent alien life, right on our planet:

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. File transfer- the file sometimes not full transferred
    By shu_fei86 in forum C# Programming
    Replies: 13
    Last Post: 03-13-2009, 12:44 PM
  2. Formatting the contents of a text file
    By dagorsul in forum C++ Programming
    Replies: 2
    Last Post: 04-29-2008, 12:36 PM
  3. Post...
    By maxorator in forum C++ Programming
    Replies: 12
    Last Post: 10-11-2005, 08:39 AM
  4. Dikumud
    By maxorator in forum C++ Programming
    Replies: 1
    Last Post: 10-01-2005, 06:39 AM
  5. My program, anyhelp
    By @licomb in forum C Programming
    Replies: 14
    Last Post: 08-14-2001, 10:04 PM