The longer run was made using the database program from ChessBase, and yep, 11 days, and still not finished.
One big speed up, was to reconstruct the games as long (very long) strings, moving the PGN tagged info to the very end, and then using system sort -- you can't beat it for string sorting, to sort all those games as strings. Finding dupes is then a piece of cake, and this is where I also believe the database programs fall short.
I don't have a version of BASIC anymore, but it would have also done a fine job with this. It's a DIY program success story, really.