Thread: Reading an IR Spectrosopy File (.SPA file) (Unknown Binary File)

  1. #1
    Registered User
    Join Date
    Sep 2011
    Posts
    117

    Reading an IR Spectrosopy File (.SPA file) (Unknown Binary File)

    I just took an IR of a compound I made (Propyl acetate) and saved the output to a file (it is in binary i believe).

    There is the original file "jon_spec.SPA" and another file "binary.txt" where I just copied and pasted the nonsense in the text editor believing it is the binary that needs to be read.

    It should look very similar to this: http://1.usa.gov/TWscPM

    I do not know how the file is outputted, but to me the most logical way would just be a 1D Array with the values of the peaks at set intervals (lets say 1 cm^-1, that is the unit it is in).

    In short, is there a way to read this data without knowing how it is structured?

    I've tried playing around just loading 10/100 at a time seeing if it outputs anything useful (code below). When I use double it outputs integers, so probably saved the data as a short?

    I know some of this is technical so if you need additional info please just ask

    I'll also be contacting the manufacturer of this machine, hopefully they will let me know how the data is structured...

    Thanks!!

    Code:
    // reading a complete binary file
    #include <iostream>
    #include <fstream>
    using namespace std;
    
    ifstream::pos_type size;
    char * memblock;
    
    int main ()
    {
        char character[100];
        int integer[100];
        long longint[100];
        double doublefloat[100];
    
        ifstream file ("binary.txt", ios::in|ios::binary|ios::ate); //jon_spec.SPA
        if (file.is_open())
        {
            size = file.tellg();
            memblock = new char [size];
            file.seekg (0, ios::beg);
            file.read (memblock, size);
            file.close();
    
            int marker = 0;
    
            for(int a = 0; a<10; a++)
            {
                doublefloat[marker] = (double)memblock[a];
                cout << doublefloat[marker++] << endl;
            }
    
    
    
            delete[] memblock;
        }
    
        else cout << "Unable to open file";
    
        return 0;
    
    }
    Attached Files Attached Files
    My Ctrl+S addiction gets in the way when using Code Blocks...

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    If you want to start probing the innards of data files like this, you really need a hex editor.
    Normal text editors make all sorts of guff when presented with non-printable characters.
    So you should really open the .SPA file in your program.

    Or failing that, a hex dump tool such as od
    Code:
    $ od -v -Ax -t x1z jon_spec.SPA | more
    000000 53 70 65 63 74 72 61 6c 20 44 61 74 61 20 46 69  >Spectral Data Fi<
    000010 6c 65 0d 0a 00 00 00 00 00 00 00 00 00 00 54 68  >le............Th<
    000020 75 20 4e 6f 76 20 31 35 20 31 31 3a 30 36 3a 32  >u Nov 15 11:06:2<
    000030 38 20 32 30 31 32 20 28 47 4d 54 2d 30 38 3a 30  >8 2012 (GMT-08:0<
    000040 30 29 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >0)..............<
    000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0000b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0000d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0000e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000100 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000110 00 00 00 00 00 00 00 00 00 00 00 00 00 00 1a 00  >................<
    000120 01 00 09 02 01 00 07 00 74 0d 51 d4 27 00 00 00  >........t.Q.'...<
    000130 02 00 30 02 00 00 8c 00 00 00 00 00 00 00 00 00  >..0.............<
    000140 6a 00 bc 02 00 00 38 00 00 00 00 00 00 00 00 00  >j.....8.........<
    000150 69 00 f4 02 00 00 0c 00 00 00 00 00 00 00 00 00  >i...............<
    000160 1b 00 00 03 00 00 c8 00 00 00 00 00 00 00 00 00  >................<
    000170 68 00 c8 03 00 00 a0 01 00 00 00 00 00 00 00 00  >h...............<
    000180 03 00 68 05 00 00 58 3a 00 00 00 00 00 00 00 00  >..h...X:........<
    000190 82 00 c0 3f 00 00 a4 03 00 00 00 00 00 00 00 00  >...?............<
    0001a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0001b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0001c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0001d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0001e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0001f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000200 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000210 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000220 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000230 03 00 00 00 96 0e 00 00 01 00 00 00 10 00 00 00  >................<
    000240 3e fa 79 45 13 14 c8 43 52 6a 1f 3b a0 3c 00 00  >>.yE...CRj.;.<..<
    000250 50 1e 00 00 08 00 00 00 00 80 f2 45 00 80 00 00  >P..........E....<
    000260 00 40 00 00 08 00 00 00 00 00 80 3f 00 00 00 00  >.@.........?....<
    000270 00 00 00 00 fa 05 00 00 04 ac b2 40 00 00 00 00  >...........@....<
    000280 00 d8 76 46 00 00 80 3f 00 00 c8 42 00 00 c8 42  >..vF...?...B...B<
    000290 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0002a0 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0002b0 00 00 00 00 00 00 00 00 00 00 00 00 5a 00 00 00  >............Z...<
    0002c0 0a 00 00 00 01 00 00 00 02 00 00 00 18 00 00 00  >................<
    0002d0 00 00 20 41 00 40 1c 46 00 00 00 00 00 00 00 00  >.. [email protected]........<
    0002e0 00 00 00 00 00 00 00 00 00 00 80 3f 00 00 c8 40  >...........?...@<
    0002f0 00 00 00 00 02 00 04 00 00 10 80 00 00 00 00 00  >................<
    000300 43 6f 6c 6c 65 63 74 20 53 61 6d 70 6c 65 0d 0a  >Collect Sample..<
    000310 09 20 42 61 63 6b 67 72 6f 75 6e 64 20 63 6f 6c  >. Background col<
    000320 6c 65 63 74 65 64 20 6f 6e 20 54 68 75 20 4e 6f  >lected on Thu No<
    000330 76 20 31 35 20 31 31 3a 30 36 3a 35 33 20 32 30  >v 15 11:06:53 20<
    000340 31 32 20 28 47 4d 54 2d 30 38 3a 30 30 29 0d 0a  >12 (GMT-08:00)..<
    000350 09 20 46 69 6e 61 6c 20 66 6f 72 6d 61 74 3a 09  >. Final format:.<
    000360 25 54 72 61 6e 73 6d 69 74 74 61 6e 63 65 0d 0a  >%Transmittance..<
    000370 09 20 52 65 73 6f 6c 75 74 69 6f 6e 3a 09 20 32  >. Resolution:. 2<
    000380 2e 30 30 30 20 66 72 6f 6d 20 34 30 30 2e 31 35  >.000 from 400.15<
    000390 36 38 20 74 6f 20 33 39 39 39 2e 36 34 30 31 0d  >68 to 3999.6401.<
    0003a0 0a 09 20 42 65 6e 63 68 20 53 65 72 69 61 6c 20  >.. Bench Serial <
    0003b0 4e 75 6d 62 65 72 3a 41 46 4e 30 32 30 30 39 30  >Number:AFN020090<
    0003c0 34 0d 0a 0d 0a 00 00 00 07 00 00 00 08 10 6c b0  >4.............l.<
    0003d0 71 44 6c b0 71 44 b9 52 95 42 39 36 36 2e 37 36  >qDl.qD.R.B966.76<
    0003e0 00 08 10 1b 08 85 44 1b 08 85 44 7d c3 8a 42 31  >......D...D}..B1<
    0003f0 30 36 34 2e 32 35 00 08 10 1d 7e 9a 44 1d 7e 9a  >064.25....~.D.~.<
    000400 44 3a 61 74 42 31 32 33 35 2e 39 34 00 08 10 ea  >D:atB1235.94....<
    000410 ba aa 44 ea ba aa 44 1d 18 89 42 31 33 36 35 2e  >..D...D...B1365.<
    000420 38 34 00 08 10 56 2a b7 44 56 2a b7 44 44 88 92  >84...V*.DV*.DD..<
    000430 42 31 34 36 35 2e 33 32 00 08 10 be b6 d9 44 be  >B1465.32......D.<
    000440 b6 d9 44 1b 66 72 42 31 37 34 31 2e 37 31 00 08  >..D.frB1741.71..<
    000450 10 cb b8 39 45 cb b8 39 45 75 30 88 42 32 39 37  >...9E..9Eu0.B297<
    000460 31 2e 35 35 00 00 00 00 00 00 00 00 00 00 00 00  >1.55............<
    000470 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000480 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000490 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0004a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0004b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0004c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0004d0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0004e0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    0004f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000500 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000510 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000520 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000530 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000540 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000550 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  >................<
    000560 00 00 00 00 00 00 00 00 3e d8 ab 42 c5 c2 ab 42  >........>..B...B<
    000570 14 bc ab 42 d5 cd ab 42 7f de ab 42 f3 df ab 42  >...B...B...B...B<
    000580 50 e1 ab 42 36 e8 ab 42 c0 ed ab 42 72 f8 ab 42  >P..B6..B...Br..B<
    000590 34 08 ac 42 62 07 ac 42 db ef ab 42 51 da ab 42  >4..Bb..B...BQ..B<
    0005a0 9e d5 ab 42 35 d8 ab 42 de d8 ab 42 e0 dc ab 42  >...B5..B...B...B<
    0005b0 6b ea ab 42 b9 f8 ab 42 d2 f7 ab 42 5d e7 ab 42  >k..B...B...B]..B<
    0005c0 26 d7 ab 42 f8 cf ab 42 14 d1 ab 42 9e dc ab 42  >&..B...B...B...B<
    0005d0 d1 f2 ab 42 ca 09 ac 42 78 15 ac 42 f7 0e ac 42  >...B...Bx..B...B<
    0005e0 6d fe ab 42 4e fb ab 42 ea 0f ac 42 37 23 ac 42  >m..BN..B...B7#.B<
    0005f0 ef 1c ac 42 c4 04 ac 42 fa f0 ab 42 d7 ef ab 42  >...B...B...B...B<
    As you can see, there is a bit of informative text at the beginning of the file, and some more text at offset 0x300 (some of which appears on your screen grab image).

    So for example, if you did
    file.seekg (0x300, ios::beg);
    you would be able to read "Collect Sample"

    The actual numeric part of the data seems to begin at around offset 0x560, and runs through to almost the end of the file.

    The first thing to do is try and work out how many bytes per sample point are used. Then try to figure out what the number format is (2-byte integer, 4-byte floating point, or something else).

    > I'll also be contacting the manufacturer of this machine, hopefully they will let
    > me know how the data is structured...
    The more enlightened manufacturers tend to provide tools (and maybe even an SDK) to allow you to do all the groovy stuff.
    If you get zip from them, then reverse engineering the file format is pretty hard work.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Sep 2011
    Posts
    117
    Hey thanks! That Hex Editor helped a lot. I think I got the graph I wanted, luckily the data was stored as I imagined, an array of numbers (turns out to be floats).

    I'll just post what I found in case anyone else gets the same problem in the future.

    Started from the 0x560 offset and read in a byte at a time till it wasn't NULL. Went back one byte, and tested each type of number. When testing for floats it gave values 0-100 which makes sense because the Y-Axis is %transmittance, as well as being fairly close to what the graph actually is. I couldn't tell for certain as it was a bit difficult to tell what the X-Axis Values were.

    They said in text that the X-Axis values were 3999.6401 - 400.1568 c^-1 with resolution 2, which should come out to about 1800 numbers. But the graph had about 3700 floats. My guess would be it actually was resolution 1 as there are 3600 units, and the extra 100 numbers are nonsense.

    Thanks again for your help

    Here is the code:
    Code:
    #include <iostream>
    #include <fstream>
    #include <cstdlib>
    using namespace std;
    
    char* addr(void *p)
    {
       return reinterpret_cast<char*>(p);
    }
    
    
    int main ()
    {
    
        const int SIZE = 3700; //can go till 3800 I think, but all near 0 due to Methylene Chloride.
    
        char buffer[100];
    
        short shortInt[SIZE];
        int integer[SIZE];
        long longint[SIZE];
        float floatPoint[SIZE];
        double doublefloat[SIZE];
    
        for(int a = 0; a<SIZE; a++)
        {
            shortInt[a] = 0;
            integer[a] = 0;
            longint[a] = 0;
            floatPoint[a] = 0;
            doublefloat[a] = 0;
        }
    
    
        ifstream ifs ("jon_spec.SPA", ios::in|ios::binary);
        if (ifs.is_open())
        {
            ifs.seekg (0x560, ios::beg);
    
            for(int a = 0; a<10; a++)
            {
                ifs.read(buffer,1);
    
                if(buffer[0] != 0)
                {
                    //cout << "Loaction of non-NULL char: " << a << endl;
                    break;
                }
            }
    
            ifs.seekg(-1L, ios::cur);
            ifs.read(addr(floatPoint), sizeof(floatPoint));
            ifs.close();
    
            ofstream ofs("Graph.txt", ios::out);
    
    //        for(int a = 0; a<SIZE; a++)
    //            cout << floatPoint[a] << endl;
    
            for(int a = 0; a<SIZE; a++)
                ofs << floatPoint[a] << endl;
    
            ofs.close();
    
        }
    
        else cout << "Unable to open ifs";
    
        return 0;
    
    }
    EDIT: Changed line 61 to:
    Code:
    ofs << 4000-a << " " << floatPoint[a] << endl;
    in order to show the values I were guessing at 1 resolution.

    The results are very promising however there is an offset of -84 cm^-1.

    I.E. The Aliphatic C-H peak is actually at 2971.55. However in the file I outputted it is 2887.
    The Carbonyl peak is actually at 1741.71 but in the file I outputted it is 1658.

    It is good that the shifts are consistent which means it is indeed 1 unit of resolution. The only reason I can think of that there is this shift is that they started collecting results before 4000cm^-1 even though in their text they stated that they have not.

    I'll contact manufacturer to confirm.
    Last edited by JonathanS; 11-21-2012 at 10:19 AM.
    My Ctrl+S addiction gets in the way when using Code Blocks...

  4. #4
    Registered User
    Join Date
    Sep 2011
    Posts
    117
    This was the manufacturers response:
    "That information is company confidential and the file format is program specific proprietary."

    Joy, luckily I made it to be fairly accurate reading. If you have any questions feel free to PM me.

    "multiple" is just the resolution (change of cm^-1 per float), it wasn't a nice number like 1 probably because of the limitations of machines of that nature.

    Here is the code:

    Code:
    #include <iostream>
    #include <fstream>
    #include <cstdlib>
    using namespace std;
    
    char* addr(void *p)
    {
       return reinterpret_cast<char*>(p);
    }
    
    
    int main ()
    {
    
        const int SIZE = 3696;
    
    
        float Transmittance[SIZE];
    
        for(int a = 0; a<SIZE; a++)
        {
            Transmittance[a] = 0;
        }
    
    
        ifstream ifs ("jon_spec.SPA", ios::in|ios::binary);
    
        if (ifs.is_open())
        {
            ifs.seekg (0x568, ios::beg);
    
            ifs.read(addr(Transmittance), sizeof(Transmittance));
            ifs.close();
    
            ofstream ofs("Graph.txt", ios::out);
    
            double start = 4000.1568;
            double multiple = 0.964904;
    
            for(int a = 0; a<SIZE; a++)
            {
                ofs << start - (a*multiple) << " " << Transmittance[a] << endl;
            }
    
            ofs.close();
        }
    
        else cout << "Unable to open ifs";
    
    
        return 0;
    
    }
    Last edited by JonathanS; 11-28-2012 at 09:19 AM.
    My Ctrl+S addiction gets in the way when using Code Blocks...

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Reading in file of unknown length
    By zone159 in forum C Programming
    Replies: 2
    Last Post: 11-14-2012, 02:07 PM
  2. unknown file type reading
    By xixonga in forum C Programming
    Replies: 20
    Last Post: 11-28-2010, 12:17 PM
  3. reading a file of unknown length
    By the bassinvader in forum C Programming
    Replies: 2
    Last Post: 07-12-2006, 03:06 PM
  4. Reading in a binary file from offset into another file
    By cloudy in forum C++ Programming
    Replies: 5
    Last Post: 05-24-2006, 03:01 AM
  5. Unknown Error(c-lang), while reading a character from file
    By c_square in forum C++ Programming
    Replies: 2
    Last Post: 01-14-2005, 04:00 AM