# Reading in large arrays ...

• 04-26-2010
I'm looking at the ITA problem "BitVector Genealogy" at

Current ITA Software Hiring Puzzles

It involves 10,000 vectors of 10,000 bits each. I'm not used to dealing with this much info. My plan is to read in all the vectors and then create an array of ints to store the distance between each vector. My current plan is to read the bits into an array of strings (or a 2d array of bits), and compare the two strings finding out which indicies have different values.

The question I have is this:

Is the best way to store these distances just to define an array (call it distAB)that's a 10000x10000 array of ints? obviously distAB(x,y)=distAB(y,x) and distAB(x,x)=0 so I can dynamically define this, but it still seems like a lot of memory, especially when I have that array being compiled from the array of 10000 vectors (which I can then get rid of). Is this the proper way to do things?
• 04-26-2010
MK27
If it's a bit vector, you need 10000/8 bytes, not ints. Use unsigned char and set bits, 8 bits per char, 1250 bytes per individual. So the whole matrix will only be 12.5 mb, which is not so much. I just downloaded the data for that problem and notice it is a string of chars representing each bit (100mb of data). You could go that route too, but methinks real bits will be better.

You will still want to malloc() that on the heap rather than declaring it on the stack, I think.
• 04-26-2010