Thread: program terminated by signal SEGV, ONLY on a certain server

  1. #1
    Registered User
    Join Date
    Jan 2008
    Posts
    2

    program terminated by signal SEGV, ONLY on a certain server

    Dear all

    We're experiencing something strange with our various C programs. Some of our C programs crash on

    "program terminated by signal SEGV"

    Then if we run them again, everything goes through fine. And this kind of problem only occur in our LIVE server, sadly. We got no problem with our DEV server. They're both SunOS 5.9 with similar hardware architecture.

    Here's what we got when using dbx to analyze the core dump

    .
    Code:
    ..
            dbxenv suppress_startup_message 6.2
    Reading ci_cat11a_consol_dtls
    dbx: warning: core object name "ci_cat11a_conso" matches
    object name "ci_cat11a_consol_dtls" within the limit of 14. assuming they match
    core file header read successfully
    Reading ld.so.1
    Reading libclntsh.so.9.0
    Reading librwtool.so.2
    Reading libCrun.so.1
    Reading libm.so.1
    Reading libw.so.1
    Reading libc.so.1
    dbx: warning: The corefile has a different ELF checksum for /usr/lib/libc.so.1
    checksums are:  file: 0xe7af  core: 0x65e5
    See `help core mismatch' for more details.
    Reading libwtc9.so
    Reading libnsl.so.1
    dbx: warning: The corefile has a different ELF checksum for /usr/lib/libnsl.so.1
    checksums are:  file: 0x761b  core: 0xa1e6
    See `help core mismatch' for more details.
    Reading libsocket.so.1
    Reading libgen.so.1
    Reading libdl.so.1
    dbx: warning: The corefile has a different ELF checksum for /usr/lib/libdl.so.1
    checksums are:  file: 0xf159  core: 0xa5bf
    See `help core mismatch' for more details.
    Reading libsched.so.1
    Reading libaio.so.1
    Reading librt.so.1
    Reading libmp.so.2
    Reading libmd5.so.1
    Reading libc_psr.so.1
    program terminated by signal SEGV (no mapping at the fault address)
    0x00000000:     <bad address 0x0>
    Current function is checkCat1ConsolRec
     1583         sqlcxt((void **)0, &sqlctx, &sqlstm, &sqlfpn);
    I've been searching around. But other people seem to have their programs crashed every run time. But ours work on second run (*weird*). Here's some code from the program that has the above core dump.

    Code:
    int checkCat1ConsolRec()
    {
       EXEC SQL BEGIN DECLARE SECTION;
       short t_rec_count = 0;
       EXEC SQL END DECLARE SECTION;
    
       if (strcmp((char *)r_incve_cat.arr, "1") == 0)
       {
          EXEC SQL SELECT COUNT(*) INTO :t_rec_count 
          FROM CI_CAT1_CONSOL 
          WHERE RTRIM(LOGON_ID) = rtrim(:r_logon_id)
          AND   SHIFT_DT = to_date(:r_op_d, 'YYYYMMDD')
          AND   RTRIM(EQPT_ID) = rtrim(:r_eqpt_id)
          AND   RTRIM(VSL_NAME) = rtrim(:r_vsl_m)
          AND   RTRIM(VOY_N) = rtrim(:r_voy_n);
       }
       else
       {
          EXEC SQL SELECT COUNT(*) INTO :t_rec_count 
          FROM CI_CAT1A_CONSOL 
          WHERE RTRIM(LOGON_ID) = rtrim(:r_logon_id)
          AND   SHIFT_DT = to_date(:r_op_d, 'YYYYMMDD')
          AND   RTRIM(EQPT_ID) = rtrim(:r_eqpt_id)
          AND   RTRIM(VSL_NAME) = rtrim(:r_vsl_m)
          AND   RTRIM(VOY_N) = rtrim(:r_voy_n);
       }
    
       return(t_rec_count);
    }
    
    void storeConsolRecord(char *start_t, char *ist, char *end_t, char *errFile)
    {
       if (r_mbrk_t > MAX_MEALBREAK_TIME)
           r_mbrk_t = MAX_MEALBREAK_TIME;
    
       if (strncmp((char *)r_op_d.arr, "20030301", date8_len) < 0)
          r_work_t = computeTimeDiff(start_t, end_t) - r_mbrk_t;
       else
       {
          r_work_t = computeTimeDiff(ist, end_t) - r_mbrk_t;
    
          int i;
          if (r_work_t < MIN_WORK_TIME)
          {
             i = computeTimeDiff(start_t, end_t) - r_mbrk_t;
             if (i >= MIN_WORK_TIME)
             {
                r_work_t = i;
             }
          }
       }
    
       if (strncmp((char *)r_op_d.arr, "20031121", date8_len) < 0)
       {
          r_box_n = t_oth_box + t_tl_box
                    + round_up(t_oh_box, OH_Factor)
                    + round_up(t_uc_box, UC_Factor); 
       }
       else
       {
          r_box_n = t_oth_box
                    + round_up(t_oh_box, OH_Factor)
                    + round_up(t_uc_box, UC_Factor)
                    + round_up(t_tl_box, TL_Factor); 
       }
    
       double work_hours;
       if (r_work_t > 0)
       {
          /* special: convert int to double */
          work_hours = (r_work_t * 10)/(MINS_PER_HOUR * 10.00);
    
          r_perf_n = r_box_n /work_hours;
          r_perf_d_n = r_box_n /work_hours;
       }
       else
       {
          r_work_t = 0;
          r_perf_n = 0;
          r_perf_d_n = 0;
       }
    
       if (checkCat1ConsolRec() == 0)
       {
          /* insert new record */
          insert_cat1_record(errFile);
       }
       else
       {
          /* update old record */
          update_cat1_record(errFile);
       }
    }
    
    ... The code is very long so i can't paste everything here.
    Anyone got any ideas what's happening? These programs are using pro*C

    Any help would really be appreciated! :-)

  2. #2
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Code:
    dbx: warning: The corefile has a different ELF checksum for /usr/lib/libdl.so.1
    checksums are:  file: 0xf159  core: 0xa5bf
    See `help core mismatch' for more details.
    So, what did the help files say about ELF checksums and core mismatches?

    It could be (perhaps) that the version the core is using, or the version your program is using, of the libdl.so.1 file, is not the same.

    Perhaps one version could just be updated/replaced?

    Perhaps a RAM module or disk has developed a bit of a bad spot, with just marginal reliability.

  3. #3
    Registered User
    Join Date
    Jan 2008
    Posts
    2
    Hi Adak

    Thanks so much for replying! This is a bit beyond my reach so I'm asking the infrastructure team to check things out according to your suggestions.

    I'll be keeping this thread updated once i got any answer.

    Really appreciate your help! :-)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Client-server system with input from separate program
    By robot-ic in forum Networking/Device Communication
    Replies: 3
    Last Post: 01-16-2009, 03:30 PM
  2. How to identify that user has terminated the program
    By abhijitbanerjee in forum C Programming
    Replies: 2
    Last Post: 07-11-2007, 04:43 PM
  3. Function in a Server program
    By nick048 in forum C Programming
    Replies: 1
    Last Post: 03-31-2007, 01:41 AM
  4. signal handling
    By trekker in forum C Programming
    Replies: 2
    Last Post: 07-05-2002, 02:52 AM
  5. Server client program help please.
    By XiReDDeViLiX in forum C++ Programming
    Replies: 15
    Last Post: 04-03-2002, 09:21 PM