Hash Table with Simple record in files

**infantheartlyje** · 10-07-2011

How to get index file Record number ? To read Data file record Whether should i get record number or random number of the index file ?

**~~CommonTater~~** · 10-07-2011

Ok... lets back up here... Exactly what of this do you not understand? Where is your confusion?

Based on past messages...
1) the records are not written to the disk in random order... they are written out in the order you added them.

2) The randnum value in my demo piece means nothing... it's just a number.

3) The linkage between idx and dat files is the record number.

4) You use the index file to cross-reference information... For example you might want to sort by part numbers which are out of order in the main file... write an idx with all the part numbers and their corresponding records, sort the index file... search the index file using a "Binary Search Algorythm" which can be very fast and get the record number... use the record number to retrieve the data from the main file... It's a very simple struct, also used in a random access file...

Code:

struct t_PartNumIdx
  { int partnum;    // search for this
     int record; }    // to get this
  PartNumIdx;

It's simply an advanced search technique, rather than sequentially searching the entire main file.

It seems to me your first priority would be to get it working well enough that you can add, edit and view data, based on record numbers... the indexing stuff can wait until you're actually creating a valid main file.

Ok... so what *exactly* are you stuck on?

**~~CommonTater~~** · 10-07-2011

Here... I redid the demo program to make it more clear what is going on... Compile this up and try it out... analyse how it works...

Code:

//random access file demo
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <ctype.h>
#include <string.h>

#define FNAME "random.dat"

// test data struct
struct t_Record
  { int number;
    char word[16]; }
  Record;



///////////////////////////////////////////////////////
// Random Access File Handlers
//

// open or create the file
FILE *FileOpen(char* Filename)
  { FILE* pFile;
    pFile = fopen(Filename,"rb+");
    if (!pFile)
      pFile = fopen(Filename,"wb+");
    return pFile; }


// Write a record to the file
int WriteRecord(FILE *File, int RecNum)
  { if( fseek(File, RecNum * sizeof(Record), SEEK_SET) == 0 )
      if ( fwrite(&Record,sizeof(Record),1,File) )
        return 1;
    return 0; }


// read a record from the file
int ReadRecord(FILE *File, int RecNum)
  { if( fseek(File, RecNum * sizeof(Record), SEEK_SET) == 0 )
      if ( fread(&Record,sizeof(Record),1,File) )
        return 1;
    return 0; }



//////////////////////////////////////////////////////////////
// View a Record
//

int ViewRecord (FILE *File, int RecNum)
  { if (! ReadRecord(File,RecNum))
      { printf("Invalid record\n"); 
        return -1; }
    printf("-----\n");
    printf("Record        : %d\n",RecNum);
    printf("Number Value  : %d\n",Record.number);
    printf("Word Value    : %s\n",Record.word);
    printf("-----\n");  
    return RecNum; }



//////////////////////////////////////////////////////////////
// Add a new record
//

int AddRecord(FILE *File)
  { memset(&Record,0,sizeof(Record));
    printf("\nEnter a number : ");
    scanf("%d", &Record.number);
    printf("Enter a word : ");
    scanf(" %s",Record.word);
    fseek(File,0,SEEK_END);
    fwrite(&Record,sizeof(Record),1,File);
    return (ftell(File) / sizeof(Record)) - 1; }



//////////////////////////////////////////////////////////////
// Edit a record
//

int EditRecord(FILE *File, int RecNum)
  { if (! ReadRecord(File,RecNum))
      { printf("Invalid record");  
        return -1; }
    printf("\n-----\n");
    printf("Record        : %d\n",RecNum);
    printf("Number Value  : %d\n",Record.number);
    printf("Word Value    : %s\n",Record.word);
    printf("-----\n");  
    
    do
      { while(getchar() != '\n');
        printf("Change Values: Number, Word or Save (N, W or S) ? ");
        switch (toupper(getchar()))
          { case 'N' :
              printf("\nEnter new number : ");
              scanf("%d",&Record.number);
              break;
            case 'W' : 
              printf("Enter new word : ");
              scanf(" %15s", Record.word);
              break;
            case 'S' :
              if (WriteRecord(File,RecNum))
                printf("\nRecord #%d updated\n",RecNum);
              return RecNum; } }
    while(1);
    return -1; }


////////////////////////////////////////////////////////////////
// List records
// 

void ListRecords(FILE *File )
  { int i = 0;
    while (ReadRecord(File,i))
      { printf("Record #%d\tNumber = %d\t\tWord = %s\n",i,Record.number,Record.word); 
        i++; }
    printf("\n\n"); }



////////////////////////////////////////////////////////
// this is for demonstration purposes only
// you would not do this in a real program
void InitFile(FILE* File)
 { int x, y;
   memset(&Record,sizeof(Record),0);
   for (x = 0; x < 10; x++)
      { Record.number = rand();
        for (y = 0; y < ((Record.number % 15) + 1); y++)
          Record.word[y] = (rand() % 26) + 'a';
        Record.word[y] = 0;
        if (! WriteRecord(File,x))
          printf("Oh drat!");  } }
 


//////////////////////////////////////////////////////////
// program mains
//
int main (void)
  { int Rec = 0; // record number
    FILE *File;

    srand(time(NULL));

    File = FileOpen(FNAME); 
    if (!File)
      { printf("Curses foiled again!\n\n");
        exit(-1); }

    printf("Random Access File Demonstration\n\n");
 
    do
      { printf("Menu : Dummy, Add, Edit, View, List Quit (D, A, E, V, L or Q) : ");
        switch(toupper(getchar()))
          { case 'D' :
              printf("Creating dummy file of 10 entries\n");
              InitFile(File);
              break;
            case 'A' :
              Rec = AddRecord(File);
              printf("Record #%d Added\n\n", Rec);
              break;              
            case 'E' :
              printf("\nRecord number (-1 Cancels): ");
              scanf("%d",&Rec);
              if (Rec > -1)
                EditRecord(File,Rec);
              break;
            case 'V' :
              printf("\nRecord number (-1 Cancels): ");
              scanf("%d",&Rec);
              if (Rec > -1)
                ViewRecord(File,Rec);
              break;
            case 'L' :
              ListRecords(File);
              break;
            case 'Q' :
              fclose(File);
              return 0; } 
              
         while(getchar() != '\n'); }
    while (1); 
    return 0; }

**infantheartlyje** · 10-07-2011

Thank you so much for your kind explanation. Now i'll do simple exercise on main file .

**infantheartlyje** · 10-07-2011

Hi ...Thank you so much. I did one sample program. Its working correctly. Thank you so much for your kind information. But again i got one doubt. Can we delete a Record ?

**~~CommonTater~~** · 10-07-2011

Deleting records from random files is a bit of work.

You can simply mark a record as "unused". For example in my inventory systems deleted items are marked with a -1 in the part number, meaning it's unused.

Now comes a decision...

1) If you aren't worried about keeping things in any special order you can simply search for an unused record and add a new record in it's place. This has the advantage that you never need to run file maintenance to clean out the unused records. You just need to update your index files which can be done very quickly in memory.

or

2) If you need to keep things in order (usually chronologically) you can add new records to the end of the file and periodically run a maintenance routine to rewrite the file, skipping over the unused records and compacting the file. The big advantage here is disk space since the file is periodically compacted (say when you reach 10% deletions), but then you have to rebuild all your indexes from scratch since the record numbers will change.

I generally use the second method with big data files (like my inventory packages) running the maintenance routines on the weekends.

**infantheartlyje** · 10-07-2011

Hi...this is my declaration part.

Code:

struct t_Record
  { 
    int randnum;
    char S_trancode[2];
    char S_sectype[4];
    char S_secsym[4];
    char D_tdate[8];
    char D_sdate[8];
    int N_quantity;
    int N_trdamt;
    char S_sourcetype[4];
    char S_sourcesym[6];
}Record;

This is my input part

Code:

printf("Enter Transacation Code : ");
    scanf("%2s",Record.S_trancode);
    printf("Enter Security Type Code : ");
    scanf("%4s",Record.S_sectype);
    printf("Enter Security Symbol : ");
    scanf("%5s",Record.S_secsym);
    printf("Enter Trade Date : ");
    scanf("%8s",Record.D_tdate);
    printf("Enter Settle Date : ");
    scanf("%8s",Record.D_sdate);
    printf("Enter Quantity : ");
    scanf("%d",&Record.N_quantity);
    printf("Enter Trade Amount : ");
    scanf("%d",&Record.N_trdamt);
    //getsrcortype();
    printf("Enter Source/Destination Type ");
    scanf("%4s",Record.S_sourcetype);
    printf("Enter Source/Destination Symbol ");
    scanf("%5s",Record.S_sourcesym);

there are my inputs

separate comma (,) to differentiate inputs)
by, csus, ibm, 02052000, 06052000,200,1000,caus,cash

i stored it in tat file. While retrieving this am getting the output like below
bycsusibm csusibm ibm 0205200006052000 06052000 200 1000 causcash cash

How to rectify the result now?

**infantheartlyje** · 10-08-2011

Originally Posted by CommonTater

2) If you need to keep things in order (usually chronologically) you can add new records to the end of the file and periodically run a maintenance routine to rewrite the file, skipping over the unused records and compacting the file. The big advantage here is disk space since the file is periodically compacted (say when you reach 10% deletions), but then you have to rebuild all your indexes from scratch since the record numbers will change..

Please Explain me this with a simple example.
In my file there will be over 3000 records and every record will have 10 or 15 more fields. This is one client file.

There will be more client. In that time if i use index file for every client file, it will occupy more storage. (Every client file should have one index file ).

In Struct based random access method should not we create any index file ? I understood from your coding that we need not create any index file for the client file. The Record number also useful to search a record. But is it right ? if i stored more records, in Struct based random access method what are problems i may meet ??? How to rectify that problems ? Please explain me with a simple example ?

**~~CommonTater~~** · 10-08-2011

Originally Posted by infantheartlyje

Hi...this is my declaration part.

Code:

struct t_Record
  { 
    int randnum;
    char S_trancode[2];
    char S_sectype[4];
    char S_secsym[4];
    char D_tdate[8];
    char D_sdate[8];
    int N_quantity;
    int N_trdamt;
    char S_sourcetype[4];
    char S_sourcesym[6];
}Record;

This is my input part

Code:

printf("Enter Transacation Code : ");
    scanf("%2s",Record.S_trancode);
    printf("Enter Security Type Code : ");
    scanf("%4s",Record.S_sectype);
    printf("Enter Security Symbol : ");
    scanf("%5s",Record.S_secsym);
    printf("Enter Trade Date : ");
    scanf("%8s",Record.D_tdate);
    printf("Enter Settle Date : ");
    scanf("%8s",Record.D_sdate);
    printf("Enter Quantity : ");
    scanf("%d",&Record.N_quantity);
    printf("Enter Trade Amount : ");
    scanf("%d",&Record.N_trdamt);
    //getsrcortype();
    printf("Enter Source/Destination Type ");
    scanf("%4s",Record.S_sourcetype);
    printf("Enter Source/Destination Symbol ");
    scanf("%5s",Record.S_sourcesym);

there are my inputs

separate comma (,) to differentiate inputs)
by, csus, ibm, 02052000, 06052000,200,1000,caus,cash

i stored it in tat file. While retrieving this am getting the output like below
bycsusibm csusibm ibm 0205200006052000 06052000 200 1000 causcash cash

How to rectify the result now?

That's because all of your storage arrays in your struct are 1 byte too small...

if you are reading two bytes with scanf() ... say AB ... you need three bytes to store it because C strings have a trailing 0 at the end... You have not made space provision for this extra byte. When printf reads the data, it's not "null terminated" so it just keeps on reading until it does find a null.

What does this struct represent, anyway?

**~~CommonTater~~** · 10-08-2011

Originally Posted by infantheartlyje

Please Explain me this with a simple example.
In my file there will be over 3000 records and every record will have 10 or 15 more fields. This is one client file.

The number of fields is not important so long as every struct in the file is exactly the same size.

There will be more client. In that time if i use index file for every client file, it will occupy more storage. (Every client file should have one index file ).

No you do not make index files for every client... you make index files for key data in your struct... for example, in an inventory program you would make 1 index file that indexes record numbers from part numbers... that is each struct in the index file contains 1 part number and 1 record number. (Frankly I don't think you are grasping this concept at all well.)

In Struct based random access method should not we create any index file ? I understood from your coding that we need not create any index file for the client file. The Record number also useful to search a record. But is it right ? if i stored more records, in Struct based random access method what are problems i may meet ??? How to rectify that problems ? Please explain me with a simple example ?

Did you give my revised demo code a try? Go ahead add as many new records as you like... hundreds, thousands, tens of thousands... it's still going to work just like it does with the 10 sample records created by the Dummy file function... That's the point of it... the file is not limited, except by disk space.

And why the heck is randnum still in your disk record? What are you using it for? You don't enter it and you don't display it... why is it there?

**infantheartlyje** · 10-08-2011

Originally Posted by CommonTater

The number of fields is not important so long as every struct in the file is exactly the same size.

And why the heck is randnum still in your disk record? What are you using it for? You don't enter it and you don't display it... why is it there?

Sorry. That randnum variable is a Record Number. I forget to rename the randnum.

The field names are
Record no, Tran. code, sec. type, Sec. Symbol, Trade date, Settle date, Amount, Quantity, source type, source symbol
0 by csus ibm 01012000 01052000 100 10000 caus cash

if i want to delete a record, i simple change the record no. as -1. Is this correct to delete a record in huge files ?

Originally Posted by CommonTater

No you do not make index files for every client... you make index files for key data in your struct... for example, in an inventory program you would make 1 index file that indexes record numbers from part numbers... that is each struct in the index file contains 1 part number and 1 record number. (Frankly I don't think you are grasping this concept at all well.)

Really i didn't get this concept clearly. What is part number. Please explain the concept with my fields? How to create index files for all client files ??

**~~CommonTater~~** · 10-08-2011

Originally Posted by infantheartlyje

Sorry. That randnum variable is a Record Number. I forget to rename the randnum.

No it isn't... the record number is not stored in the struct, nor should it be since file maintenance can change record numbers.

Take a very close look at how my code accesses the disk file... The record number is not searched for in the file, it is used to mathematically calculate the right position in the file to begin reading ... fseek(File,recnum * sizeof(struct), SEEK_SET);
You don't want this information inside the file because the record numbers will change as you maintain your database.

Because the record number is calculated, these disk files behave just like arrays of structs. 0 is the first record, 1 is next and so on up to the size of the file...

Alright... here's an example inventory record...

Code:

struct t_partslist
  { char desc[64];
    unsigned PartNum;
    int      MaxStock;
    int      Reorder
    int      Current
    char     Supplier[16];
    int      Cost;
    int      Retail; }
  partslist;

See the variable PartNum in there... Ok that is a manufacturer's assigned part number for the item. It's used when reordering and it's marked on the box for every item. Manufacturers generally try very hard to have unique number sequences for their products, so it's a natural thing for the operator to key in when trying to check stock...

However; these part numbers do not occur in numerical order nor do they correspond in any way to the record number in the disk file... they're just a number. You might find PartNum 667232556 in record #3 of the file... Did you not notice that the numeric value in my demo code had nothing to do with the record number?

Ok so here's our sales clerk typing in the part number from the box... now, we need some way to get from that part number to the main file's record for that part... That's where index files come in...

We make an index file of all the part numbers in the main file, and the record number of each... like this...

Code:

struct t_partidx
  { unsigned PartNum;
     int        Record; }
  partidx;


// to build an index
int recnum = 0;
while (ReadRecord(File,RecNum,MainRecord)
  { partidx[recnum].PartNum = MainRecord.PartNum;
     partidx[recnum].Record = recnum;
     recnum ++; }

Now we have an out of order list of part numbers, so to speed things up we want to sort the file... no problem, load it into memory, sort it, write it back out so that it is in order by part numbers...

When our trusty salesmaker hits enter, we do a binary search in the index file and find the part number... since we have the part number, we also have the record number for the main file... now we go to the main file and open the record indicated by Record... voila... the information is loaded.

Partnumber -> index -> record number -> main data ... get it?

Now I don't know exactly what you're doing and I sure as heck can't guess from the contents of that struct of yours... but it would for sure and certain help me a lot if you would explain it...

**infantheartlyje** · 10-10-2011

Please tell me solution for my DB fields. There will be more clients. Every client will have a client file. The client file will have the below fields.

1. Transaction Code
2. Security Type
3. Security Symbol
4. Trade Date
5. Settlement Date
6. Quantity
7. Amount
8. Source Type
9. Source Symbol

In your solution how to set index file as common to all client files. What my index files should have? Here i don't have any part numbers as like the inventory mgmt.

I will set my questions here. After reading this help me how to build a index file for this DB?

1. The client may ask the transaction report between the two trade dates ?
2. The client may ask the appraisal report for a specific date ?
3. The manager may ask the composite appraisal report of the specific clients for a specific date ?
4. The client may ask the report for specific security symbol ?

Now how can i build index file for all client files?

**~~CommonTater~~** · 10-10-2011

First of all there's nothing about a client in your record... this looks like a single transaction record... not a client file.

In a client's information I would expect to see a name, address, phone numbers, account number and so on.
There you would either use the record number as an account number or build an index of account numbers.
Secondary indexes might key on phone numbers or names...

But for what appears to be a chronological tansaction record, no, you probably won't benefit from making any index files. These types of files tend to grow as transactions are added and transaction codes tend to increment in order... indexing a file like this would yeild little if any benefit.

The searches you describe would require a client's account number in the struct if you're going to print information for individual clients...

1) read the file sequentially and begin printing all records for a given account number on the given start date then stop when the end date is reached.

2) same as #1 with the start and end dates the same.

3) same as #1 but for all clients.

4) Same as #1 except keyed to the symbol not the date.

These are all sequential searches... indexes will do you absolutely no good in any of these cases.

The only advantage to random access here is when you have to locate a specific transaction number, in which case --since transaction numbers tend to increase sequentially-- a simple binary search would give you a very quick result...

When we started this conversation you said you were building a database of your clients... an inventory of people. THAT is a situation very well suited to random access filing techniques and THAT is what I thought we were talking about...

Now, I find out that what you're doing is essentially sequential filing of transactions and not connecting them to any customer at all...
Had I known this up front, I would never have entered this thread.

**infantheartlyje** · 10-10-2011

Okie. Now I got another doubt. The every record stored in random order in that file. Just forget my client's Transaction file.
These are the records stored in that file.
one name and date of birth
there are 200,000 Records. Ok. Now i want to search the person who were born in specific date. In your method i have to search for all (200,000) records.
But if the records are in sorted order by date of birth, we can check the record until our date is reached. after that we can neglect the remaining records.
But in the random order access, we have to search for all record. Please explain me how to rectify the problem ??

In my client transaction file i want to search a record for specific date. If it is ordered by date means, we can search it upto the specific date. In random access method we have to search all records. The processing is waste for some records. I hope you understand my problem ? Please help me !