Originally Posted by
C_programmer.C
My data will actually consist of a single table consisting of 2 integer fields, 2 varchar (20) fields and one boolean field [and] a maximum [of] 10,000 rows.
What kinds of queries do you need to implement? Does a typical query return only one entry (or none), or a set of entries? What about sorting the results? What is the ratio of queries to additions and deletes? Will your program access the data from multiple threads simultaneously?
Hash tables will work very well for exact string matches for the varchar fields, but not substring matches. Substring (partial) matches will be relatively slow in any case.
For looking up integer ranges, a linear array per integer field should do, but it won't help with sorting. Forming the records into linked lists will help, although it makes insertions more complicated.
Leaving out the locking code needed if multiple threads are used, here's what I'd probably use:
Code:
/* Each database record:
*/
struct rec {
long i1;
struct rec *i1_prev;
struct rec *i1_next;
long i2;
struct rec *i2_prev;
struct rec *i2_next;
char v1[23]; /* v1[22] = '\0' */
unsigned char v1_len;
unsigned long v1_hash;
struct rec *v1_next;
char v2[23]; /* v2[22] = '\0' */
unsigned char v2_len;
unsigned long v2_hash;
struct rec *v2_next;
_Bool b; /* Or perhaps an enum? */
};
/* Actual database (grown dynamically):
*/
size_t records_max = 0;
size_t records = 0;
struct rec *record = NULL;
/* Index of field i1, sorted:
*/
long *i1_array = NULL;
struct rec **i1_record = NULL;
/* Index of field i2, sorted:
*/
long *i2_array = NULL;
struct rec **i2_record = NULL;
/* Hash table of v1:
*/
size_t v1_hash_size = 0;
struct rec **v1_hash = NULL;
/* Hash table of v2:
*/
size_t v2_hash_size = 0;
struct rec **v2_hash = NULL;
Addition requires insertion of the new i1 into i1_array and i2 into i2_array (meaning two memmove()s of potentially records longs). The previous and next entries in the arrays can be used to fix the i1_prev, i1_next, i2_prev, and i2_next links in the record structure, without traversing the lists. The new records can be prepended to the hash table list for each varchar.
For the hash function, I'd use DJB2, XOR variant. It's excellent for ASCII strings, fast and simple.
Any single-field query (assuming having thousands of records, but not tens of thousands of records) will be faster than constructing and then parsing the corresponding SQL, so no database that uses SQL strings can beat the above for single-field queries.