    I need some advice please.

    I have to extract 70000 new words from some corpus and add them to a database which already has 30000 words.

    I don't know which approach I should take so that I won't face 1- memory problems and 2- lots of in loop checking.

    I though a good way might be what "sets" represent (no repetitive items can be seen in sets) but I am not sure if C# has that data structure or not?

    I would appreciate ant suggestion and help.
    Thank you in advance

    I suggest that you use SQLite or something similiar. This way, you can take advantage of the database engine to have a unique index, basically offloading the hard database work.
    The HashSet<T> type can certainly assist you with that. Duplicate additions will automatically be discarded so you can simply load the initial 30k words into the object and then add the next 70k without manual checks. It internally uses a hashtable (couldn't have guessed that from the name, right?) so duplication checks are very quick.

    As far as memory is concerned, definitely should not be a problem.

    The only caveat is that order is not maintained. The final list will contain each word exactly once, but if you iterate over the list, they'll come in a seemingly random order.
    Thank you very much laserlight and itsme86

