1. ## Binary Search Tree

Hi,

Let's say I have a struct like this one.

Code:
struct person {
surname,
name,
age,
languages,
next person,
}
You can be my guess and expand the information. What I will do is save people's data in .bin file and then read them when I open the program to make some comparison. For example I'll list all the people that knows both English and German. Or something like listing people who are older than 20 yo. Well these are not the parts I'm thinking on right now before building the structure.

So those comparisons I explained above will be made using binary search tree. Saving person's age is simple since it's just an integer value and a person cannot have more than one age right? But when it comes to something like languages person can know more than one. How should I exactly deal with those. Is it going to be something like struct inside struct. I am really beginner at programming and don't have any idea about it right now but I'm sure this is simple problem for all programmers since this kind of system can be used in a lot of places.

2. Originally Posted by Vsky
So those comparisons I explained above will be made using binary search tree. Saving person's age is simple since it's just an integer value and a person cannot have more than one age right? But when it comes to something like languages person can know more than one. How should I exactly deal with those. Is it going to be something like struct inside struct. I am really beginner at programming and don't have any idea about it right now but I'm sure this is simple problem for all programmers since this kind of system can be used in a lot of places.
One approach would be to model it for a relational database, e.g., create a table to model persons, create another table to model languages, then since there is many to many relationship between persons and languages, create a third table to link them. With the database in place, you can then write and execute SQL statements to create, read, update, and delete the data, e.g., one SQL statement can be used to query the database for "all the people that knows both English and German", another SQL statement can be used to query the database for "people who are older than 20 yo".

Granted, the structures behind such a database are more complex than just binary search trees, but then there are database engines that can be embedded into your program, e.g., SQLite, and indeed these exist precisely because "this kind of system can be used in a lot of places".

Of course, even for an in-memory database, at some point you might want to retrieve the data from the database and store it in your own data structures, but what exactly those structures should be depends on what exactly you are trying to do with your program. They could well correspond rather directly to the database tables, or they could be rather different.

3. if i understand well, you need lets say mike who knows both english and german but not ema that knows only english?
Code:
structperson {    surname,
name,
age,
language1,
language2,
next person,
}
every time you enter the language that someone knows, you can have a "sum". so, if (sum1 + sum 2) >2, you print the person

4. Originally Posted by Giwrgows x
Why not 3, or 4, or 5? Furthermore, having numbered member variables is unlikely to be a good approach: an array would be better (in which case it could be dynamic to address my point about a variable number of languages), but that would not satisfy Vsky's concern about efficiently (hence the binary search tree topic) listing all persons who know a given language as one would have to do a linear search of persons for the given language.

5. @laserlight

Thanks for answer. I've looked into SQLite. I managed to create table and save data in it but I had trouble with saving variables into tables. I wrote the program by saving structs inside bin file instead of SQLite database and it works fine. But how can I implement it to binary search tree.

For testing purposes I made small struct.

Code:
struct person {
char name[50];
char language[99];
char licence[50];
int deleted;
};
So program has basically 3 menus. Main menu asks the user if he is employer or company owner. For employers user can enter his data, delete his data, view his data and update his data. I made it this far. I can enter, view, delete and update data. 2nd part is not so tricky either. There gotta be bunch of options as I stated examples in main post. So program works as intended except it doesn't use binary search tree. And at this point I'm not really sure why do I need it. So how can I change it? I didn't share whole code btw because it's kinda long but I'm sure you have an opinion of what I did.

6. Originally Posted by Vsky
So program works as intended except it doesn't use binary search tree. And at this point I'm not really sure why do I need it. So how can I change it?
A binary search tree, if well balanced, will allow you to search for information faster than say, linear search of an array, assuming that what you are searching for uses the key with which the binary search tree is constructed.

The thing is, if you have many different and separate search criteria, then you need to have a binary search tree for each of them. But of course you don't want to have so many different copies of your data, so you might have just one tree that stores the data, and the other trees store pointers to the data in that tree.

My suggestion of a database engine such as SQLite is because the nitty gritty of this process is done for you. Instead of having to construct so many different binary search trees, you create an index with the relevant columns, and the database engine generates the structures needed to make it efficient to search and sort by those criteria.

7. @laserlight

As I said I couldn't use SQLite properly. I wasn't even able to insert variable instead of hard coded text into database table. I'd like to do it manually it seems easier I guess. This is my code so far.

http://pasted.co/887b8525/fullscreen...e&linenum=true

It's a bit messy because it's not near complete. Can you give me a hint about changing the way I save struct in bin file such will help me to use BST.

8. Have you tried searching some sample programs with sqlite in google. i just got this link "http://www.tutorialspoint.com/sqlite/sqlite_c_cpp.htm" in my first attempt .Seems to be relevant to your issue.

9. Why we need a third table to link here. In first table itself a field can be populated with all langauge id's of other table . This solves the issue is it not ?.

10. Originally Posted by Vsky
As I said I couldn't use SQLite properly. I wasn't even able to insert variable instead of hard coded text into database table.
Then you could have asked for help on how to use it properly. For example, I spent a bit of time coming up with an example based on your scenario of "listing people who are older than 20 yo":
Code:
#include <stdio.h>
#include "sqlite3.h"

struct person
{
char name[100];
int age;
};

int create_table(sqlite3 *db)
{
int result = 1; /* success */

sqlite3_stmt *stmt = NULL;
const char *sql = "CREATE TABLE persons (name TEXT, age INTEGER);";

if (sqlite3_prepare_v2(db, sql, -1, &stmt, NULL) == SQLITE_OK)
{
if (sqlite3_step(stmt) != SQLITE_DONE)
{
printf("Error: Could not create table 'persons':\n>>> %s\n",
sqlite3_errmsg(db));
result = 0;
}
}
else
{
printf("Error: Could not prepare creation of table 'persons':\n>>> %s\n",
sqlite3_errmsg(db));
result = 0;
}
sqlite3_finalize(stmt);

return result;
}

int populate_table(sqlite3 *db, const struct person *persons, size_t num_persons)
{
int result = 1; /* success */

sqlite3_stmt *stmt = NULL;
const char *sql = "INSERT INTO persons (name, age) VALUES (:name, :age);";

sqlite3_exec(db, "BEGIN;", NULL, NULL, NULL);

if (sqlite3_prepare_v2(db, sql, -1, &stmt, NULL) == SQLITE_OK)
{
size_t i;
for (i = 0; i < num_persons; ++i)
{
sqlite3_bind_text(stmt,
sqlite3_bind_parameter_index(stmt, ":name"),
persons[i].name,
-1,
SQLITE_TRANSIENT);
sqlite3_bind_int(stmt,
sqlite3_bind_parameter_index(stmt, ":age"),
persons[i].age);
if (sqlite3_step(stmt) != SQLITE_DONE)
{
printf("Error: Could not insert into table 'persons':\n>>> %s\n",
sqlite3_errmsg(db));
result = 0;
break;
}

sqlite3_reset(stmt);
}
}
else
{
printf("Error: Could not prepare insertion into table 'persons':\n>>> %s\n",
sqlite3_errmsg(db));
result = 0;
}
sqlite3_finalize(stmt);

sqlite3_exec(db, (result ? "COMMIT;" : "ROLLBACK;"), NULL, NULL, NULL);

return result;
}

{
int result = 1; /* success */

sqlite3_stmt *stmt = NULL;
const char *sql = "SELECT name, age FROM persons WHERE age>=:age;";
if (sqlite3_prepare_v2(db, sql, -1, &stmt, NULL) == SQLITE_OK)
{
sqlite3_bind_int(stmt,
sqlite3_bind_parameter_index(stmt, ":age"),

while (sqlite3_step(stmt) == SQLITE_ROW)
{
printf("%s (%d)\n",
sqlite3_column_text(stmt, 0),
sqlite3_column_int(stmt, 1));
}
}
else
{
printf("Error: Could not prepare selection from table 'persons':\n>>> %s\n",
sqlite3_errmsg(db));
result = 0;
}
sqlite3_finalize(stmt);

return result;
}

int main(void)
{
const struct person persons[] = {
{"Alice", 20},
{"Bob", 15},
{"Charlie", 18},
{"Mallory", 25},
{"Trent", 30},
};
const size_t num_persons = sizeof(persons) / sizeof(persons[0]);

const char *filename = ":memory:";
sqlite3 *db = NULL;

if (sqlite3_open(filename, &db) == SQLITE_OK)
{
if (create_table(db) && populate_table(db, persons, num_persons))
{
}
}
else
{
printf("Error: Could not create/open database '%s'\n", filename);
}

sqlite3_close(db);

return 0;
}
Now, this just saves into an in-memory database, and it does less error checking than it should, but it should give you a good idea of how this might be done.

Originally Posted by Vsky
I'd like to do it manually it seems easier I guess.
If you choose to go with my suggestion:
• You need to learn about relational database design and SQL. While different SQL dialects complicate the issue, these skills tend to be transferable.
• You need to learn the interface of the database library. While not so transferable, this tends to be reusable.
• If you decide that you want searching and sorting on some field or set of fields to be fast, you index it. You can do this in the CREATE TABLE statement, or if decide after the table is created, you can issue a CREATE INDEX statement. No changes to your actual code are required.

If you want to go the route with binary search trees:
• You need to decide on a file format, and figure out how to write to it and parse it.
• If you decide that you want searching and sorting on some field or set of fields to be fast, you create a binary search tree for it. For a basic binary search tree, this is fairly trivial, but if you want a balanced binary search tree, then it is a bit more complex and hence you might choose to use a library, in which case you have to learn the library's interface anyway. If you choose the trivial option, then you risk the worst case scenario of it degrading into a linked list, i.e., searching and sorting would not be fast.

Either way you need to decide on the structures that you need to represent the data within your program.

So, if it easier to do it "manually"? I don't think so. Either way you are going to learn something, which is good, but you can learn about binary search trees with specific experiments where you only need to create one binary search tree and experiment with it. Given your requirements, using a database engine like SQLite would be easier and likely less error-prone once you get past the first two hurdles (that force you to acquire transferable and reusable skills).

Originally Posted by Vsky
This is my code so far. It's a bit messy because it's not near complete. Can you give me a hint about changing the way I save struct in bin file such will help me to use BST.
It is very far from complete: you have not written any code to construct even one basic binary search tree.

I have already told you how I envision it might be done if you really want to use binary search trees, and the way that I outlined is mostly independent from the file format.

Originally Posted by vlrk
Why we need a third table to link here. In first table itself a field can be populated with all langauge id's of other table . This solves the issue is it not ?.
Yes, but now, using only SQL, without doing further work to parse the results in C or any other programming language, construct a query to list all the persons associated with a given language.

If we loosen the requirements to allow you to do one extra query to get the language id, you will find that the obvious attempt will likely return false positives, e.g., you store language ids in a string such as '1,2,3', and then when you search for a field LIKE '%1%', a row with the field value of '3,12' is returned.

So, perhaps you store as '"1","2","3"', and then search with LIKE '%"1"%'. Great, you have eliminated false positives, but you still have to make that extra query to get the language id. Furthermore, even if you figured out how to get the join to work nicely such that you don't need that extra query, the search will probably fail to take advantage of an index on the column, which was a motivating factor for Vsky with all this talk about binary search trees.

On the other hand, with a join table, the solution is simple: just join all three tables and select the rows WHERE languages.name='<language>'.

11. I'd follow Laserlights advice; sqlite is an excellent tool for this.

For a while, I wondered whether there was a BST-based search-optimized data structure for this -- in case the point of the exercise is to become adept at BST operations --, but I failed to find one.

The structures closest to optimal that I found do allow optimal searches (binary search for exact matches using strcmp() or strcoll(), and linear search over the set of unique values in said field using strstr()), but they use sorted arrays instead of trees. If we consider strings only (or use string representation for the age numbers):
Code:
#include <stdlib.h>
#include <string.h>
#include <locale.h>
#include <stdin.h>
#include <errno.h>

/* One unique value in a database field.
*/
struct value {
size_t              ids_count;      /* Number of records having this value */
size_t             *ids;            /* Array of record IDs having this value */
char               *value;
};

/* Descriptor for a database field.
*/
struct field {
size_t              values_count;   /* Number of unique values in this field */
struct value       *values;          /* Array of unique values */
char               *name;
};

/* field=value reference using array indexes.
*/
struct ref {
size_t              field;
size_t              value;
};

/* Descriptor for a database record.
*/
struct record {
size_t              refs_count;     /* Number of values in this record */
struct ref         *refs;           /* Field index and value index for each. */
};

/* Database structure.
*/
typedef struct {
size_t              fields_count;
struct field       *fields;
size_t              records_count;
size_t             *records_id;
struct record      *records;
size_t              next_id;
} table;
The table structure contains the next available record ID (in next_id), an array of fields_count field structures (in fields) describing the table fields, and an array of records_count record identifiers (in records_id) and record structures (in records) listing the exact contents of each record.

Each field structure contains an array of values_count values structures (in values), sorted in ascending order. Each structure describes one unique value used (in this particular field) by one or more records; the ids_count record identifiers are listed in the ids array member.

Note that you can allow more than one value per field per record; this would be useful for e.g. language selection.

If you add a hash field to the value structure, you can add a hash table for the field structure to speed up exact matches (useful when exact matches are most common search used, and there are lots of unique values; if substring matches are most common, the hash table is not that useful).

The exact contents of each record can be reconstructed using the field and value structures, using a binary search to find out which values have the target record ID listed. However, because there are potentially many unique values and many fields, that would be slow.

To allow for rapid access to the values in a particular record, the records are also stored in an array. To make looking up the array index for a specific record ID, the IDs are in a separate array (records_id). The record is essentially just an ordered set of field-value index pairs (to fields array in the table structure, and to its values array member).

To implement a search, you first find the value structures that match your keywords. For an OR search, you combine the record IDs listed in the respective value structures. For an AND search, you keep only record IDs that are listed in both value structures. For a NOT search, you omit record IDs that are listed in the NOT value structure. Ranges and substring matches you treat as a hidden OR search, combining all record IDs listed in any matching values.

For complex queries, you can speed up the processing by selecting the order of applying the search operators so that the intermediate lists are kept short.

Because all values and record IDs are sorted, you can use a binary search to find any exact matches. Substring matches and ranges are just as easy, as you have a simple array (per field) to scan.

As you can see, implementing this requires quite a bit of code, and as an exercise could be summarized as "How to write a simple in-memory database engine from scratch".

I suspect that whoever suggested BST for this, was thinking about
Code:
struct person {
struct person   *name_up;
struct person   *name_left;
struct person   *name_right;
const char      *name;

struct person   *age_up;
struct person   *age_left;
struct person   *age_right;
int              age;

struct person   *language_up;
struct person   *language_left;
struct person   *language_right;
const char      *language;
};
where each record participates in three different binary search trees. You can make it work, but you'll have lots of duplicated code (to maintain each separate tree). You can combine the code by using array indexes to identify the tree traversed,
Code:
#define NAME_TREE    0
#define AGE_TREE     1
#define LANG_TREE    2
#define PERSON_TREES 3

struct person {
struct person   *up[PERSON_TREES];
struct person   *left[PERSON_TREES];
struct person   *right[PERSON_TREES];
const char      *name;
int              age;
const char      *language;
};
In both cases, however, you are restricted to one language per person (unless you treat the value as a string containing the languages, separated by whitespace or some other character -- but then you lose ordering in the language BST.)

The end result is fragile and not as fast as you might expect, and the implementation ends up requiring roughly as much code as in my example above, if not more. Thus, the suggestion of using BSTs for this is stupid. Better discard that idea.

(Nevertheless, binary search remains a good way to speed up exact searches of in-memory data, and only requires you to keep the data sorted in an array. So, BST is out for this, but binary search is okay.)

Obviously, using a library such as sqlite lets you avoid all the data structure details, and concentrate on the operations on the data instead.