The method I'd use depends on the level of access I would want to provide to the end-user. For a modifiable game, I'd probably use XML. But then again a mod for a game is normally created with an editor which readily writes binary files in a format the game understands.
The option I've chosen for most projects is pure binary chunks. They require no parsing, are extremely fast to load, fit very well into C/C++ I/O and thus are easy to work with, and are easily compressed/decompressed.
My thinking is that my game editor is what I want people to use in order to create new maps, levels, scripts, characters, etc, for my game. It is my feeling that if you provide the correct tools for the end-user the amount of hacking into your game decreases a lot. No one respects or looks up to some moron who can simply use a game editor as opposed to someone hacking into the data and then altering the game. So I figure if I provide the tools then hopefully they will use them as well as create content for the game that will keep it alive and well. It won't eliminate hackers and cheaters all together but I'm really not sure if thats possible in today's world.
But for terrains and data sets like you are talking about I use pure binary in a format of my choosing and normally use RLE compression on it. You can get fancy and use Huffman, but there is really no need unless you are tight on storage space or requirements.
Just my two cents. Overall I think XML is ugly.
Terrain data storage
In the small terrain engine I've coded I'm using raw binary heights for the heightfield data. I scale them as desired when I create the vertex buffer. Normally I go from 8 bit heights to a scale of 32-bit heights. This usually encompasses nearly every value I need and produces nice smooth terrains.
Another way to store heights is only store the first value and then every value preceding it is an offset. For instance:
128 5 1 2.....
Would really be this:
128 133 134 132
The final value is a function of the last value plus the new offset.
You can use unsigned values and any value over 128 or so you could subtract instead of add. So a value of 128 would actually be 0, and 129 would be -1. This would give you a total range of -127 to +128.
So to subtract 1 you would see this:
To add 1
The advantage here is that you store a lot of information in a very small space and you can easily do some type of compression on it and yet still maintain a regular grid which is very easy to work with. Other data can also be encoded into the files by using a full 32 bits for other information. The first 8 bits would be the actual height value, followed by terrain types, etc, etc.
Height | Terrain type | ..... | ..... |
You can also use this method except the offsets are meant to be subtracted/added from the FIRST value in the file. Note that since terrains are normally smooth and flowing (with some exceptions), a lot of height data can be packed in a file using offset information instead of actual height values.
There are about a trillion ways to go about storing heights. You can also store the heights in the color map by using alpha as the height value:
Height | Red | Green | Blue
Therefore: Height=(DiskValue & 0xFF000000) >> 24
It's very easy to program your own utility that adds height values to a bitmap file. Note that you can still use D3DX or any BMP loader to load your file because your height values are still read in, just not used by the loader.
//Setup some vars
//Allocate new buffer to contain final data to write to disk
DWORD *pBuffer=new DWORD[m_dwHeightFieldSize];
//Bail if pBuffer invalid
if (!pBuffer) return;
//Fill buffer with new DWORD composed of color and height
for (DWORD i=0;i<m_dwHeightFieldSize;i++)
//Bail on failure
(if handle==-1) return;
//Write buffer to file
//Free the buffer memory
delete  pBuffer;