html web translator/interpreter
What I'd like to do is write a C++ program that does the following:
1) Opens a file with html code that's the source for a web page.
2) Creates a new .txt file with some kind of name appendage (I'll probably do it so that if such a file already exists, the program just aborts right here after closing the original file).
3) Writes into the text file new text of the sort such that if you write it onto a web page, what you'll see in your browser is the text of the original html file.
This is much less confusing than I'm unfortunately making it sound. Here's an example:
In our old html file (call it orig.htm), we see <p>Hello, World!</p>
The new file (call it orig_new.txt), I want to have <p>Hello, World</p>
What I'm wanting to do is post some solutions to html problems on my website, including the code. So, I want the browser to render what the html coder sees in the html file, and I don't want to create the code for the code by hand.
Where I need help for the moment is in setting the whole thing up. I know the basics about how to open files, read them, and write new ones. And I'll need to review in detail exactly what special symbols will get me by in the translation (just < and > will already get you pretty far), but what all is needed is an issue I'm not asking about at the moment anyway.
What my problem is is that you're not translating "<" into exactly one character but into FOUR CHARACTERS. So, you can't just create a big character array for the whole file and make a simple translation each time you come to this character because the required memory will change.
One solution would be to read the entire original file into a string. But this seems like greater length than is at least INTENDED for normal string variables. I mean, maybe it's perfectly normal and never creates problems to have strings as long as maybe even 50k characters. The html files I'll be dealing with are normally only a few thousand characters long, and I'll likely even be breaking those up in practice, but I'd like for the program to work even on long html files without getting runtime errors or weird results.
So, another solution would be to do something like creating a string variable that holds in memory the first 20 (?) characters from the original file. Then this variable will have the flexibility to expand so as to replace an instance of "<" with an instance of "<" without getting into odd memory situations.
What I'm wondering is how basically to set up the variables that record what I read from the old file before writing the translation to the new file.
In short: What's a good size for a healthy string variable in this context? Or am I worrying unnecessarily about creating potentially huge (compared to what I've used in C++ up to now) string variables? Or, is solving this problem with string variables going to be harder than I think so that I should just try to find a program that does it and not attempt to write the code myself?
If such a program already exists (as it presumably does), I'd still like to code this on my own just as exercise unless you guys tell me that it's going to require more advanced skills than I think it will. I really think this should be doable without getting into advanced programming issues.