I would like to uppercase proper nouns in a file.
So i thought to make a list of proper nouns in a b.txt file, open a.txt and b.txt and compare each word in a.txt with the list in b.txt.
Am i on the right way to go like this?
Wouldn't it slow down the program significantly?
Yes, it would. You should load the dictionary into an in-memory data structure (e.g. a hash map or a trie) and do the lookup there. File I/O is simply too slow. Unless you memory-map the file and it already has a good internal structure, but that's advanced stuff.
The main problem is that there's overlap between proper nouns and other words. How do you decide which one is meant?
You mean like overlap between proper noun Bo and the word bowling?
If this is what you mean, they are not the same.
Or you mean proper nouns that have a meaning too? Like bill.
Yes, true that this would cause a problem.
In that case i can only think of making a dictionary only with proper nouns without meaning.
To get around that you could try and use some kind of grammer checking code, but that would be realy complicated and slow.
Ah yes i guess i see what you mean, thats a good idea actually.
I think you mean to check if there is a verb beside it or something.
Yes that could be pretty hard to realize.
in all honesty im not the best person to ask for help, im really new to programming. It just struck me that programs such as word processors somehow tell if it is a proper noun, but i have no idea how.
Bill is one example.
OK, another problem: what about invented names? Can you recognize Jingizu, A'sua, and all that stuff from fantasy and science fiction novels? Or even simple foreign names, like Günther?
Language analysis is extremely difficult, precisely because language is so ambiguous.
is this program scanning a text document and then highlighting the proper nouns? Becuase you could scan for all words beggining with a capital, and then filter out things that are not a proper nouns like sentance starts and titles (Mr ,Mrs, Doc).
I think the user of the program could use a list of proper nouns in the language of the file that he intends to modify. And he could add invented names to the list.
No, actually the program is meant to capitalize proper nouns.
sorry, that was a typo i meant uppercase them. Being able to have the user add thier own sounds liek a good idea.
Ok i start to realize that in C++ the same things could have different names.
So there is how we beginners learn things there is how you "gods" talking. :)
Now by any chance , when CornedBee says to me to "load the dictionary into an in-memory data structure" could mean to use pointers to each word of the dictionary?
Because i dont seem to find an in-memory data structure tutorial.
Günther isnt really a foreign name where I live. I guess that woudl depend on where in the US or world you reside. If you are doing language analysis, most researchers have had good luck using multilayer perceptrons.
Originally Posted by CornedBee
Thanks for the advice but i dont think im gonna go that far.
Right now im just searching how to load a file in memory.