Quote:
Hi all, i am trying to write a program to help me manage word lists, basically just merge 2 word lists together, and filter out the double words...
but one problem i can see coming is that some word lists can be quite large, some up to several gigabytes, normally when i have written programs to manipulate data i have been working with databases which are tiny in comparison. so i really have no idea on how to handle such large amounts of data.
You can use merge sort for this. Take a look at the (current) Wikipedia article on merge sort, in particular the section on "merge sorting tape drives". Instead of tape drives, use files. Once all the data is sorted, removing duplicates is easy (and that's the requirement to use std::unique() too).