Dear all
We're experiencing something strange with our various C programs. Some of our C programs crash on
"program terminated by signal SEGV"
Then if we run them again, everything goes through fine. And this kind of problem only occur in our LIVE server, sadly. We got no problem with our DEV server. They're both SunOS 5.9 with similar hardware architecture.
Here's what we got when using dbx to analyze the core dump
.
Code:
..
dbxenv suppress_startup_message 6.2
Reading ci_cat11a_consol_dtls
dbx: warning: core object name "ci_cat11a_conso" matches
object name "ci_cat11a_consol_dtls" within the limit of 14. assuming they match
core file header read successfully
Reading ld.so.1
Reading libclntsh.so.9.0
Reading librwtool.so.2
Reading libCrun.so.1
Reading libm.so.1
Reading libw.so.1
Reading libc.so.1
dbx: warning: The corefile has a different ELF checksum for /usr/lib/libc.so.1
checksums are: file: 0xe7af core: 0x65e5
See `help core mismatch' for more details.
Reading libwtc9.so
Reading libnsl.so.1
dbx: warning: The corefile has a different ELF checksum for /usr/lib/libnsl.so.1
checksums are: file: 0x761b core: 0xa1e6
See `help core mismatch' for more details.
Reading libsocket.so.1
Reading libgen.so.1
Reading libdl.so.1
dbx: warning: The corefile has a different ELF checksum for /usr/lib/libdl.so.1
checksums are: file: 0xf159 core: 0xa5bf
See `help core mismatch' for more details.
Reading libsched.so.1
Reading libaio.so.1
Reading librt.so.1
Reading libmp.so.2
Reading libmd5.so.1
Reading libc_psr.so.1
program terminated by signal SEGV (no mapping at the fault address)
0x00000000: <bad address 0x0>
Current function is checkCat1ConsolRec
1583 sqlcxt((void **)0, &sqlctx, &sqlstm, &sqlfpn);
I've been searching around. But other people seem to have their programs crashed every run time. But ours work on second run (*weird*). Here's some code from the program that has the above core dump.
Code:
int checkCat1ConsolRec()
{
EXEC SQL BEGIN DECLARE SECTION;
short t_rec_count = 0;
EXEC SQL END DECLARE SECTION;
if (strcmp((char *)r_incve_cat.arr, "1") == 0)
{
EXEC SQL SELECT COUNT(*) INTO :t_rec_count
FROM CI_CAT1_CONSOL
WHERE RTRIM(LOGON_ID) = rtrim(:r_logon_id)
AND SHIFT_DT = to_date(:r_op_d, 'YYYYMMDD')
AND RTRIM(EQPT_ID) = rtrim(:r_eqpt_id)
AND RTRIM(VSL_NAME) = rtrim(:r_vsl_m)
AND RTRIM(VOY_N) = rtrim(:r_voy_n);
}
else
{
EXEC SQL SELECT COUNT(*) INTO :t_rec_count
FROM CI_CAT1A_CONSOL
WHERE RTRIM(LOGON_ID) = rtrim(:r_logon_id)
AND SHIFT_DT = to_date(:r_op_d, 'YYYYMMDD')
AND RTRIM(EQPT_ID) = rtrim(:r_eqpt_id)
AND RTRIM(VSL_NAME) = rtrim(:r_vsl_m)
AND RTRIM(VOY_N) = rtrim(:r_voy_n);
}
return(t_rec_count);
}
void storeConsolRecord(char *start_t, char *ist, char *end_t, char *errFile)
{
if (r_mbrk_t > MAX_MEALBREAK_TIME)
r_mbrk_t = MAX_MEALBREAK_TIME;
if (strncmp((char *)r_op_d.arr, "20030301", date8_len) < 0)
r_work_t = computeTimeDiff(start_t, end_t) - r_mbrk_t;
else
{
r_work_t = computeTimeDiff(ist, end_t) - r_mbrk_t;
int i;
if (r_work_t < MIN_WORK_TIME)
{
i = computeTimeDiff(start_t, end_t) - r_mbrk_t;
if (i >= MIN_WORK_TIME)
{
r_work_t = i;
}
}
}
if (strncmp((char *)r_op_d.arr, "20031121", date8_len) < 0)
{
r_box_n = t_oth_box + t_tl_box
+ round_up(t_oh_box, OH_Factor)
+ round_up(t_uc_box, UC_Factor);
}
else
{
r_box_n = t_oth_box
+ round_up(t_oh_box, OH_Factor)
+ round_up(t_uc_box, UC_Factor)
+ round_up(t_tl_box, TL_Factor);
}
double work_hours;
if (r_work_t > 0)
{
/* special: convert int to double */
work_hours = (r_work_t * 10)/(MINS_PER_HOUR * 10.00);
r_perf_n = r_box_n /work_hours;
r_perf_d_n = r_box_n /work_hours;
}
else
{
r_work_t = 0;
r_perf_n = 0;
r_perf_d_n = 0;
}
if (checkCat1ConsolRec() == 0)
{
/* insert new record */
insert_cat1_record(errFile);
}
else
{
/* update old record */
update_cat1_record(errFile);
}
}
... The code is very long so i can't paste everything here.
Anyone got any ideas what's happening? These programs are using pro*C
Any help would really be appreciated! :-)