Thread: Working with file formats

  1. #1
    Registered User
    Join Date
    Feb 2006

    Working with file formats

    I'm interested in working with current file formats, for example creating my own libraries etc for various things, e.g MP3+G. Problem is, ive no idea where to start beyond basic C++ file i/o. Anyone got any resources that I may find useful on the subject of starting to work with file formats? Any help is appreciated thanks.

  2. #2
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008

    "File Formats" are simply a means of standardizing data for storage and transmission. The term by itself means almost nothing.

    What algorithms, data structures, and "markup" a file uses is a different matter. They are the "meat and potatoes" of a file format.

    By this I mean that a file format using only simple markup to describe a two dimensional array (PPM) is considerably simpler to understand than an image format using Huffman encoding and an unusual space (JPEG) because of the underlying algorithms.

    To understand "MP3+G" well enough to write your own tools using the file format, you will at least need an understanding of: PCM audio, several waveform transformations, "Fast Furier Transforms", the audio disc subchannels, the CD+G instruction model, and a few bits and pieces of the standard compliant "MP3" binary stream.

    In short, if you are asking the question "starting to work with file formats", you can't yet write a tool to use "MP3+G".

    I'm not trying to discourage you, I just want you to be prepared for a long path.

    Now then, grab the official "PPM" file format from your favorite search engine and study until you can produce a tool to draw a square fractal image recorded to a "PPM" file that "MS Paint", "GIMP", or "Photoshop" can process.

    Now do the same with "BMP".



  3. #3
    Frequently Quite Prolix dwks's Avatar
    Join Date
    Apr 2005
    For reference:

    I'd try studying some existing formats to see how they work. There's something to take away from every format. XML is nice because it's a text format and easily extensible. PNG has an ingenious signature (character to prevent dumping to terminals, >128 character to detect 7-bit transmission, etc) and extensible section definitions (the case indicates whether it is critical or not). From BMP you can learn how not to order RGB bytes. I like the ODT format for its ease of processing from other tools.

    Your question is really too broad to be more helpful . . . .

    Seek and ye shall find. quaere et invenies.

    "Simplicity does not precede complexity, but follows it." -- Alan Perlis
    "Testing can only prove the presence of bugs, not their absence." -- Edsger Dijkstra
    "The only real mistake is the one from which we learn nothing." -- John Powell

    Other boards: DaniWeb, TPS
    Unofficial Wiki FAQ:

    My website:
    Projects: codeform, xuni, atlantis, nort, etc.

  4. #4
    Master Apprentice phantomotap's Avatar
    Join Date
    Jan 2008
    XML is nice because it's a text format and easily extensible.
    XML is not a file format. It is only a standardized language to "markup" data.


    You can certainly use XML to create a file format.

    From BMP you can learn how not to order RGB bytes.

    I'm honestly not sure if you mean "BGR" or the reverses line order; either way, there are reasons for both.

    File formats and platforms have existed long before "OS/2" that used "BGR" for color or reversed scan line order, and either is far simpler to understand and less expensive to process than the related information in many other file formats. Look at the complete TIFF format, it allows for a multitude of compressions, byte orderings, colour spaces, and "markup" for primary and secondary streams recording raster and vector images.


  5. #5
    Registered User
    Join Date
    Feb 2006
    Thanks for the information folks, let me try to clarify. In terms of a long term goal id like to work with more advanced files and make my own libraries for working with them, however at the moment, i know nothing of how to even work with the various file formats. Other than opening a file, i have no idea where to start. Lemme noob it down:

    What im looking for is the most basic of information, ive opened a file.. now what..? It'd help if i knew what the proper terminology is for this type of programming, that way i could google properly, at the moment, im just looking for "tutorials c++ file formats", and all im coming across is the most basic File I/O tutorials.

    There really should be a "complete noob section" on this forum, id be better off there..

  6. #6
    Registered User
    Join Date
    Feb 2006
    OK, so ive started work on learning to work with the PPM file format. (im still learning C++ here btw, as you will see). Please take a look at the following code, comment, abuse, break, critize, but please above all, let me know if im missing anything, or anything that i could do better. Tested with a .ppm file created with gimp.

    Oh and yes i am aware that the beginning of the netpbm image files begin with the magic number "P*" in the first two characters with a the third character being a "\n", so feel free to point that my for loop is useless =]

    Compiles and runs fine on a *nix system

    #include <string>
    #include <iostream>
    #include <fstream>
    using namespace std;
    int main(int argc, char *argv[])
    	  int length, bufferlength;
    	  char * buffer;
    	  string newbuffer,magicno;
    	  // Read the file into a buffer (binary mode)
    	  ifstream is; ("Untitled.ppm", ios::binary );
    	  is.seekg (0, ios::end);
    	  length = is.tellg();
    	  is.seekg (0, ios::beg);
    	  buffer = new char [length]; (buffer,length);
    	  //Now check to see if the file is in a valid ppm format.
    	  // From the buffer, check the first two characters before a new line character to get its magic no
    	  newbuffer = buffer;
    	  magicno = newbuffer.substr(0,2);
    	  bufferlength = newbuffer.length();
    	  // first we need to find the first newline character
    	  for (int i = 0; i < bufferlength; i++)
    		  // Check for a newline.
    		  if (i > 20) { // large check i know. (ignore this pl0x)
    			  cout << "Unable to find header at start of the file. Aborting.";
    		  } else {
    			  if (newbuffer.substr(i,1) == "\n") {
    				  magicno = newbuffer.substr(0,i);
    	  // Now check the "magicno"
    	  if (magicno == "P3" || magicno == "P6")
    		  // Found the P3 / P6 header from the PPM file format.
    		  cout << "Detected PPM Format: " << magicno << "\n";
    	  } else {
    		  // No header found.
    	  return 0;
    Any and all comments, both positive and negative, appreciated.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Memory Leak in AppWizard-Generated Code
    By jrohde in forum Windows Programming
    Replies: 4
    Last Post: 05-19-2010, 04:24 PM
  2. Newbie homework help
    By fossage in forum C Programming
    Replies: 3
    Last Post: 04-30-2009, 04:27 PM
  3. Game Pointer Trouble?
    By Drahcir in forum C Programming
    Replies: 8
    Last Post: 02-04-2006, 02:53 AM
  4. archive format
    By Nor in forum A Brief History of
    Replies: 0
    Last Post: 08-05-2003, 07:01 PM