I've a really big problem. My new algorithm searches for patterns in ~26,000 protein sequences. If a certain pattern has been found, the sequence fragment is stored as a string in a sorted linked list. If the current fragment already exists in the list, the appropriate counter is incremented by 1.
The problem is that the amount of data produced is ~25 GBs.
I guess I have to find a less space consuming data representation but what if I have to do it the way described above? My PC does not have 25 GBs of RAM. Should I consider writing the unsorted list to a file and sorting it afterwards? But which sorting algorithm can handle GBs of data on a PC with just 256 MB of RAM without using virtual memory?
Thank you for your help.