Thread: HTML to TXT

  1. #16
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by eobergfell View Post
    Hey everybody thanks, im having trouble with using the "enum function"
    Code:
    enum Status{OUTSIDE = ?, INSIDE = ?);
    I dont know what to set OUTSIDE and INSIDE equals to.
    If anyone could help I would really appreciate it. Thanks-
    In C++, enumerated types can be used to good advantage - not so with most C compilers.
    Read the tutorial link in the post up above, for crying out loud. Especially the "Note to C programmers: ..."

    All you need is just:

    int OutText = 1;
    int OutAttrb = 1;

    as global variables, preferrably.

    Now as you scan through your document, anytime you are Outside ANY html tag, AND outside any html attribute, you're ready to copy the text out of the html file, and put it into the new text file your program is creating.

    Don't make this harder than it needs to be, please!

  2. #17
    Registered User
    Join Date
    Jun 2007
    Posts
    12
    im sorry you guys I know im retarded. I just got my feet wet in C a couple of months ago. Thank you adak and dwks for trying but i just dont get it. dwks i tried the enum out but didn't get it to work and now im tryin out adak's idea. here is my crappy code---
    Code:
    #include <stdio.h>
    void main()
    {
            FILE *fopen(),*fp;
            char c;
            int InsideTag =1;
            int InsideAttrib =1;
    
    
            printf("Enter Filename:");
            scanf("&#37;s",&fp);
            fp = fopen("home.html","r");
    
            while(!feof(fp))
            {
            if(fp==InsideTag)
            {
                    printf("%s", InsideTag); /* just to test it out */
            }
    
    
                    putchar(c);
                    c=getc(fp);
    
            }
            fclose(fp);
    
    }
    Thanks to everybody though for trying

  3. #18
    C / C++
    Join Date
    Jan 2006
    Location
    The Netherlands
    Posts
    312
    Code:
    scanf("&#37;s",&fp);
    fp = fopen("home.html","r");
    This is wrong. You are trying to store the filename in a pointer to a file! You have to ask for the filename and store it in a char *. By example:

    Code:
    FILE *fopen(), *fp;
    char filename[100];
    fgets(filename, 100, stdin);
    fp = fopen(filename, "r");
    Last edited by Ideswa; 06-28-2007 at 12:20 PM. Reason: addition of code, improving.
    Operating Systems:
    - Ubuntu 9.04
    - XP

    Compiler: gcc

  4. #19
    Registered User
    Join Date
    Jun 2007
    Posts
    12
    Thanks- That wasnt the only error right because it still doesnt do anything?
    Code:
    #include <stdio.h>
    void main()
    {
    
            char c;
            int InsideTag =1;
            int InsideAttrib =1;
    
            FILE *open(), *fp;
            char filename[100];
            fgets(filename, 100, stdin);
            fp = fopen(filename, "r");
    
            fp = fopen("home.html","r");
    
            while(!feof(fp))
            {
            if(filename==InsideTag)
            {
                    printf("&#37;s", InsideTag);
            }
    
    
                    putchar(c);
                    c=getc(fp);
    
            }
            fclose(fp);
    
    }

  5. #20
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by eobergfell View Post
    Thanks- That wasnt the only error right because it still doesnt do anything?
    Code:
    #include <stdio.h>
    void main()
    {
    
            char c;
            int InsideTag =1;
            int InsideAttrib =1;
    
            FILE *open(), *fp;
            char filename[100];
            fgets(filename, 100, stdin);
            fp = fopen(filename, "r");
    
            fp = fopen("home.html","r");
    
            while(!feof(fp))
            {
            if(filename==InsideTag)
            {
                    printf("&#37;s", InsideTag);
            }
    
    
                    putchar(c);
                    c=getc(fp);
    
            }
            fclose(fp);
    
    }
    OK, first things first - **FORGET THE FILE ACCESS, FOR NOW, ALREADY**. Just use a copy of a few lines of html text, for now, and put it right into your char array:

    char array[] = "< /center line > <font size: 12> This is text... ";

    After you get the test line or two or three to work, then you can start getting the file access (which is really just a detail, which is why we're putting it off), to work. It will go much faster.

    Why would "filename = InsideTag", make any sense at all??

    You need program logic to handle 1) reading the char array, char by char, (I'd use the array instead of the file, for now, to make it easier) 2) If or switch statements to decide if the char being examined is text or not, and 3) output of text char's to your text file.

    I see no trace of such logic statements in your code.
    Last edited by Adak; 06-28-2007 at 01:14 PM.

  6. #21
    Dr Dipshi++ mike_g's Avatar
    Join Date
    Oct 2006
    Location
    On me hyperplane
    Posts
    1,218
    Its also worth remembering that fgets() puts a new line character at the end of the string. You will need to replace the newline char with a null char which is easy to do.

    Something like this would work placed after the fgets() call. But it also require string.h to be included.
    Code:
    filename[strlen(filename)-1]='\0';

  7. #22
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    I don't feel like typing a lengthy explanation of fgets(), so have a look here: http://cboard.cprogramming.com/showp...2&postcount=16
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  8. #23
    Registered User
    Join Date
    Jun 2007
    Posts
    12
    Quote Originally Posted by Adak View Post
    You need program logic to handle 1) reading the char array, char by char, (I'd use the array instead of the file, for now, to make it easier) 2) If or switch statements to decide if the char being examined is text or not, and 3) output of text char's to your text file.
    Thats what im stuck on. I dont know how I would utilize the if statement to check if its text or not. Help anyone?-Thanks

  9. #24
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    So you have step 1 down, processing each character individually from some location, like a file or a string? Post your code.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  10. #25
    Registered User
    Join Date
    Jun 2007
    Posts
    12
    Okay, maybe thats not the only step i need. At least I thought it was reading each char.
    Code:
    #include <stdio.h>
    #include<string.h>
    void main()
    {
    
            FILE *open(), *fp;
            char filename[100];
            char c, a;
            int InsideTag =1;
            int InsideAttrib =1;
            printf("Enter Filename: ");
            scanf("&#37;s", &a);
            fgets(a,100, stdin);
            filename[strlen(filename)-1]='\0';
            fp = fopen("home.html", "r");
    
    
            while(!feof(fp))
            {
             if(.....................)
             }
    }

  11. #26
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    void main() is a bad idea. int main() is good -- see the FAQ. http://faq.cprogramming.com/cgi-bin/...&id=1043284376

    while(!feof(fp)) is a semi-common idiom, but it doesn't work very well. [edit] http://faq.cprogramming.com/cgi-bin/...&id=1043284351 [/edit]

    Code:
    FILE *open(), *fp;
    You realize that you're prototyping a function here, not declaring a variable? Leave out the () (assuming you want to declare a variable).

    You have two lines that would both read a string from stdin, if a was a string (it's a single character):
    Code:
            scanf("&#37;s", &a);
            fgets(a,100, stdin);
    Use one or the other, not both. And you should probably be reading into filename, not a.

    What's the point of getting a string from the user if you don't use it?
    Code:
    fp = fopen("home.html", "r");
    Pass filename or whatever the string is called instead of "home.html".

    Maybe you should look at this very basic program for some ideas.
    Code:
    #include <stdio.h>
    
    int main() {
        int c;
    
        while((c = getchar()) != EOF) {
            putchar(c);
        }
    
        return 0;
    }
    getchar() is the same as [f]getc(stdin), and putchar(c) is the same as putc(c, stdout) or printf("%c", c). You can modify that for your loop. Just echo every character at first like this is doing.

    [edit] Are you following me so far? Okay. Now, with the above program, instead of just calling putchar(c), check what c is first. If it's a '<', switch to "tag" mode. If it's a '>', switch to "outside" mode.

    Then, if the program is in "outside" mode, print c.

    Let me know how you're doing. [/edit]
    Last edited by dwks; 06-28-2007 at 02:58 PM.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  12. #27
    Registered User
    Join Date
    Jun 2007
    Posts
    12
    Code:
    #include <stdio.h>
    #include<string.h>
    int main()
    {
    
            FILE *open, *fp;
            char filename[100];
            char c, a;
            int InsideTag =1;
            int InsideAttrib =1;
            printf("Enter Filename: ");
            scanf("&#37;s", &filename);
    
            filename[strlen(filename)-1]='\0';
            fp = fopen("filename", "r");
    
            while((filename=getchar())!=EOF){
                    putchar(filename);
            }
    
            return 0;
    
    }
    Getting there?

  13. #28
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    Location
    Canada
    Posts
    8,057
    Erhm . . . yeah, getting there.

    But the variable you use in your while loop should be an int. getchar() etc all process a single character, so you'd think you'd use a char; but EOF has to be stored in an int, so an int it is. Not a char array of 100 elements.

    Don't use scanf() -- that's why you have the newline-stripping code. Replace
    Code:
    scanf("&#37;s", &filename);
    with
    Code:
    fgets(filename, sizeof(filename), stdin);
    For future reference, when reading strings with scanf(), you don't use an &. You use an & for every variable you can read, except strings. That's because strings are really arrays of characters and not a variable type unto themselves.

    And this
    Code:
    fp = fopen("filename", "r");
    should be
    Code:
    fp = fopen(filename, "r");
    "home.html" and "filename" are string literals. filename by itself is a reference to the variable filename, and fopen() will be passed whatever is contained in filename.

    Code:
    FILE *open, *fp;
    Why not think of better names? . . .

    You do realize that getchar() and putchar() use stdin and stdout -- the keyboard and the screen? To use the file, go
    Code:
    getc(fp)
    and
    Code:
    putc(character, fp);
    except that you'd want to write to another file, not the one you're reading from. Come to think of it, why don't you just leave the putchar() as is right now, so you don't have to open a file to write to.
    dwk

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell


    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ: cpwiki.sf.net

    My website: http://dwks.theprogrammingsite.com/
    Projects: codeform, xuni, atlantis, nort, etc.

  14. #29
    Dr Dipshi++ mike_g's Avatar
    Join Date
    Oct 2006
    Location
    On me hyperplane
    Posts
    1,218
    I'd use fgets(). your prog will crash using scanf("%s", filename); if there is a space in the filename.

    With this:
    Code:
            while((filename=getchar())!=EOF){
                    putchar(filename);
            }
    EOF is an integer value. filename is a character array.

  15. #30
    Registered User
    Join Date
    Jun 2007
    Posts
    12

    Thumbs up

    Code:
    #include <stdio.h>
    int main(argc, argv)
    int argc; char *argv;
    {
      char inputchar;
      int count = 0;
      char filename[100];
            FILE *open(), *fp;
            printf("Enter Filename: ");
            fgets(filename, sizeof(filename), stdin);
            filename[strlen(filename)-1]='\0';
            fp = fopen(filename, "r");
      while ( ( inputchar= fgetc(stdin) )!=EOF )
      {  if ( inputchar != '<' && count == 0 )
         { fputc( inputchar,stdout );
         }
         else
         if ( inputchar == '<' )
         { count++;
         }
         else
         if (inputchar == '>' )
         { count--;
         }
    
      }
    }
    With the help of an engineer I got this so far.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Please Help - C code creates dynamic HTML
    By Christie2008 in forum C Programming
    Replies: 19
    Last Post: 04-02-2008, 07:36 PM
  2. Writing an HTML Preprocessor
    By thetinman in forum C++ Programming
    Replies: 1
    Last Post: 09-17-2007, 08:01 AM
  3. Parsing HTML files
    By slcjoey in forum C++ Programming
    Replies: 2
    Last Post: 08-28-2005, 07:01 AM
  4. Design + HTML
    By orbitz in forum C Programming
    Replies: 8
    Last Post: 11-21-2002, 06:32 AM