Thread: Fastest DataBase

  1. #1
    Infant of C
    Join Date
    May 2010
    Location
    Karachi, Pakistan
    Posts
    118

    Fastest DataBase

    I am in search of a very fast (and obviously free) database. Actually it is my application's requirement that I have to access, read, write, delete and move through the a single table almost more than 1000 times in one tenth of a second. Fortunately the size of the table wont be too large. It may have at most ten thousand entries. Therefore I am looking for a database or dbms that although does not support data bulks but is very fast in processing queries. Being light weight would be a plus point. SQLite is a good choice for it?

  2. #2
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    Yes, SQLite might fit the bill.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  3. #3
    Registered User
    Join Date
    Sep 2008
    Posts
    200
    Yeah, there's a speed comparison of SQLite, MySQL and PostgreSQL here. Note that (i) this is on the SQLite website itself, and (ii) it's for a relatively small database (14 MB), but it seems that actually fits in with what you want to do.

    For more speed, you could also look into in-memory databases (though you'll have to do your own loading/saving manually, if you even want that).
    Programming and other random guff: cat /dev/thoughts > blogspot.com (previously prognix.blogspot.com)

    ~~~

    "The largest-scale pattern in the history of Unix is this: when and where Unix has adhered most closely to open-source practices, it has prospered. Attempts to proprietarize it have invariably resulted in stagnation and decline."

    Eric Raymond, The Art of Unix Programming

  4. #4
    Infant of C
    Join Date
    May 2010
    Location
    Karachi, Pakistan
    Posts
    118
    Quote Originally Posted by JohnGraham View Post
    it's for a relatively small database (14 MB)
    Yes, small size won't be a problem. Instead it would be good as I don't want to increase the size of my whole system. My data will actually consist of a single table consisting of 2 integer fields, 2 varchar (20) fields and one boolean field. And as I already told you that a maximum number of rows would still below 10,000 rows. For the sake of speed I could do this using Hash Tables or some other similar data structures, but I don't think that this idea would be good if I have a large number of rows, say 1000. Would it be? Is there some other way around to achieve this task using program memory with out rising risk of program crash and memory leaks?

  5. #5
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by C_programmer.C View Post
    My data will actually consist of a single table consisting of 2 integer fields, 2 varchar (20) fields and one boolean field [and] a maximum [of] 10,000 rows.
    What kinds of queries do you need to implement? Does a typical query return only one entry (or none), or a set of entries? What about sorting the results? What is the ratio of queries to additions and deletes? Will your program access the data from multiple threads simultaneously?

    Hash tables will work very well for exact string matches for the varchar fields, but not substring matches. Substring (partial) matches will be relatively slow in any case.

    For looking up integer ranges, a linear array per integer field should do, but it won't help with sorting. Forming the records into linked lists will help, although it makes insertions more complicated.

    Leaving out the locking code needed if multiple threads are used, here's what I'd probably use:
    Code:
    /* Each database record:
    */
    struct rec {
        long           i1;
        struct rec    *i1_prev;
        struct rec    *i1_next;
    
        long           i2;
        struct rec    *i2_prev;
        struct rec    *i2_next;
    
        char           v1[23];  /* v1[22] = '\0' */
        unsigned char  v1_len;
        unsigned long  v1_hash;
        struct rec    *v1_next;
    
        char           v2[23];  /* v2[22] = '\0' */
        unsigned char  v2_len;
        unsigned long  v2_hash;
        struct rec    *v2_next;
    
        _Bool          b;       /* Or perhaps an enum? */
    };
    
    /* Actual database (grown dynamically):
    */
    size_t       records_max = 0;
    size_t       records = 0;
    struct rec  *record = NULL;
    
    /* Index of field i1, sorted:
    */
    long        *i1_array = NULL;
    struct rec **i1_record = NULL;
    
    /* Index of field i2, sorted:
    */
    long        *i2_array = NULL;
    struct rec **i2_record = NULL;
    
    /* Hash table of v1:
    */
    size_t       v1_hash_size = 0;
    struct rec **v1_hash = NULL;
    
    /* Hash table of v2:
    */
    size_t       v2_hash_size = 0;
    struct rec **v2_hash = NULL;
    Addition requires insertion of the new i1 into i1_array and i2 into i2_array (meaning two memmove()s of potentially records longs). The previous and next entries in the arrays can be used to fix the i1_prev, i1_next, i2_prev, and i2_next links in the record structure, without traversing the lists. The new records can be prepended to the hash table list for each varchar.

    For the hash function, I'd use DJB2, XOR variant. It's excellent for ASCII strings, fast and simple.

    Any single-field query (assuming having thousands of records, but not tens of thousands of records) will be faster than constructing and then parsing the corresponding SQL, so no database that uses SQL strings can beat the above for single-field queries.

  6. #6
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    The average time per operation on the database, would be 1/10,000th of a second? I have real doubts about a PC being able to read a record, change some of it's data, and write the updated record back to the file, that fast.

    What is the program doing that requires that much speed, and can that task be changed to use slower database operations?

    I don't believe that high a speed is possible, currently.

  7. #7
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by C_programmer.C View Post
    Yes, small size won't be a problem. Instead it would be good as I don't want to increase the size of my whole system. My data will actually consist of a single table consisting of 2 integer fields, 2 varchar (20) fields and one boolean field. And as I already told you that a maximum number of rows would still below 10,000 rows. For the sake of speed I could do this using Hash Tables or some other similar data structures, but I don't think that this idea would be good if I have a large number of rows, say 1000. Would it be?
    Hash tables are genarlly the optimal solution for anything but a tiny number of rows
    Is there some other way around to achieve this task using program memory with out rising risk of program crash and memory leaks?
    Yes absolutely.
    One could write a program that is solvely responsible for storing and retrieving rows, but internally stores the data in memory in a hash table. This program can easily be small enough to be validated that it has no leaks. In fact you can avoid dynamic memory allocation completely. You can then communicate with this separate program using TCP, named pipes, shared memory, or Windows messages (assuming this is on Windows).
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  8. #8
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by Adak View Post
    The average time per operation on the database, would be 1/10,000th of a second?
    Much less. Single-variable queries are O(log N) for the integers (binary search) and O(1) for the varchars (assuming exact match).

    Quote Originally Posted by Adak View Post
    and write the updated record back to the file, that fast.
    I was assuming that due to the high rate, the requirements for on-disk consistency (in case of failure) would be relaxed, just like they are on "real" databases.

    The kind of data rate we're talking about -- 10,000+ I/O requests per second -- requires a more efficient approach: battery-backed caches, replication, or something. It just does not make sense to try to update files at that rate.

    Quote Originally Posted by Adak View Post
    I don't believe that high a speed is possible, currently.
    Surely it is possible, at least if one relaxes the requirements in case of failure? That is, if the database does not need to hit disk after every update, but at say regular intervals? In case of a crash or hardware failure, at most one interval's worth of data is lost.

  9. #9
    Infant of C
    Join Date
    May 2010
    Location
    Karachi, Pakistan
    Posts
    118
    I am really very sorry, I really for got to discuss the most important issue that is most critical too, the concurrency control. For all four major operations (INSERT, UPDATE, SELECT, DELETE) there exist a separate thread-- no thread will perform two different operation, except UPDATE may be followed by a DELETE operation in a single thread. I don't think SQLite is able to provide it on its own, and I must have to implement it at application level, I am sure a language wrapper would have support for it!
    I am really sorry if I am vogue at some places. The INSERT to DELETE ratio will be almost equal. That is at Every INSERT, a DELETE and a SELECT statement will execute. No substring matches. The query rate of 1/10000 is the wort case. At an average it would be around 1/2000 I/Os per second.

  10. #10
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    I thought you wanted high performance?

    > For all four major operations (INSERT, UPDATE, SELECT, DELETE) there exist a separate thread--
    ...
    > The INSERT to DELETE ratio will be almost equal.
    What bonehead decided that a forced context switch for EVERY database operation was going to achieve anything?

    You're going to need record-level locking to get anywhere with this, and hope the usage profile doesn't contend over the same record too often.
    A single global lock on the whole database will just crucify performance.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  11. #11
    Infant of C
    Join Date
    May 2010
    Location
    Karachi, Pakistan
    Posts
    118
    I think I should Explain the whole scenario here and I may get some useful recommendations.

    My application consist of mainly two roles. A commander and a number of clients. Here will be a main thread having two sub thread running. One for commander and other for client. The commander is a small piece of code generating a number command strings per second. The client thread have two sub threads. One for registering new clients, other for handling already registered clients. The later one is responsible for updating the status of client when they logged in to the system. It is also responsible for receiving the commands from the commander and send them to clients. There may be at maximum less than 10,000 client at a time logged in.

    This is all the summary that I want perform. Please suggest me a better solution to increase performance and reliability of the system.
    I thought that a data table can be utilized to temporarily store commands from commander, the client thread will keep searching for new commands in the table and once command is delivered to the client the thread will look that if it is a command that should be have a response from client, It will update its status to be read, else it will delete this command from the table.

  12. #12
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    So, if I understand you correctly, what you really want to do is to have a (logical) list of currently logged in clients, and a (logical) stream of commands to those clients?

    Does every client receive every command, or are commands distributed to only specific clients?

    The way I would do it, is to have a service that maintains the list of logged in clients, and the stream of commands. If the code generating the commands is simple enough, then it can be incorporated to this service.

    When a client logs in, it has to be the one that opens up the connection to get the commands from the service. If all clients receive every command, or there are at most a few distinct groups or levels, then I'd use multicast UDP messages. (The client has to join the multicast group, the service cannot do it on behalf of the client.) The client logs out by closing the multicast socket, and telling the service they're logging out.

    There are security implications, though. I would not send any privileged or sensitive information. (Even if it is technically possible to encrypt the multicast messages, all the clients would share the same key. A single compromised client will compromise the entire command stream.)

    Note that this way, the message rate is much smaller; the service only sends each unique command message once, and the networking hardware will spread a copy to each client. (Using multicast UDP, it does mean that if the network is congested, routers may drop UDP packets, and some clients may not receive all commands. If the command stream is on a local network, where typically all (intact) packets are delivered, then there should be no dropped UDP packets in practice.)

    If each client receives an unique set of messages, or you need per-client confirmation of each command, or per-client encryption, then you need to think about how each client connects to the service. In Linux, 10,000 sockets is not over the top (although you'll probably have to change the default per-process limits), but achieving good message rates (especially considering the command strings are so short), you might have to look at memory-mapping the socket tx buffer. (Gets a bit complicated, but allows you to send many packets per syscall, avoiding excessive context switching.)

    If each client is local, then I'd use a shared-memory ring buffer for the command messages, with atomic counters, and a mutex and a condition variable in case some client wishes to wait for a new message. (Most compilers provide atomic ops, either __sync_...() or __atomic_...(). The latter are becoming standardized, from C++, but the former are supported by most C compilers on most architectures.)
    Again, the security implications are very similar to multicast sockets; all clients see all messages.

    Given such short messages, it might also work to locally use an Unix domain stream socket per client, but send all pending messages in bursts. This would reduce the number of context switches, but the per-message latency might grow. This would be the practical local route if clients are only allowed to see specific messages.

    I personally would definitely not include any kind of SQLish database in this picture.

  13. #13
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Agreed. A database is meant for when you want persistence after your application is not longer running. That does not seem to be needed here.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  14. #14
    Infant of C
    Join Date
    May 2010
    Location
    Karachi, Pakistan
    Posts
    118
    Quote Originally Posted by Nominal Animal View Post
    Does every client receive every command, or are commands distributed to only specific clients?
    For each client there would be a unique command or set of command at a given instance.
    Quote Originally Posted by Nominal Animal View Post
    In Linux, 10,000 sockets is not over the top (although you'll probably have to change the default per-process limits), but achieving good message rates (especially considering the command strings are so short), you might have to look at memory-mapping the socket tx buffer.
    I am not too strong at networking, and hence unable to clearly understand what you mean. What I get from this is that you want the service to put command on the system's socket's tx buffer to which the client is attached. And the client will auto pick up the command when it hits the system. If this is the case then I have a few confusions:

    1. All the clients are going to connect on a single port -- port 8 of the system. In this case wont they be sharing same tx buffer?
    2. The client is actually an Ardiuno Circuit. Arduino have enough capabilities to do what I have understood from your post?

    Quote Originally Posted by Nominal Animal View Post
    If each client is local, then I'd use a shared-memory ring buffer for the command messages, with atomic counters, and a mutex and a condition variable in case some client wishes to wait for a new message.
    All clients will connect over internet. Even if they are local, Again is there a possibility that Arduino can be programmed to adopt in a token ring?

  15. #15
    Ticked and off
    Join Date
    Oct 2011
    Location
    La-la land
    Posts
    1,728
    Quote Originally Posted by C_programmer.C View Post
    For each client there would be a unique command or set of command at a given instance.
    This means you need a separate connection between the service and each client.

    Quote Originally Posted by C_programmer.C View Post
    And the client will auto pick up the command when it hits the system.
    No. The client must continuously read/receive/await messages from the service. The service will only send them, it's up to the client to receive them. Or not.

    Will the clients send any responses other than login ("I'm here! Send me commands!") and logout ("I'm going away, don't send me stuff anymore! Bye bye!") messages?

    Quote Originally Posted by C_programmer.C View Post
    All the clients are going to connect on a single port -- port 8 of the system. In this case wont they be sharing same tx buffer?
    Each client requires a separate connection to the service, so IP multicast is out. On the client end, the send and receive buffers are local to each client, of course, anyway.

    Your practical choices are TCP/IP, and UDP/IP. The former is the one you use for e.g. web pages, and the latter for e.g. streaming media. The latter is lighter weight, but routers (routers, not plain switches) may drop UDP packets if the network is congested. On a local area network, UDP is pretty reliable.

    If you use Linux to host the service, and UDP/IP to communicate with the clients, then you can use the same UDP socket (on port 8) to send and receive packets to/from any client. Then, there is only one transmission (tx) buffer on the service side (and only one receive buffer). Then, it is possible to use a memory-mapped transmission buffer to send multiple UDP messages to a number of clients using a single syscall, reducing the CPU time (context switches, actually) needed to send the messages to the clients.

    However, on a fast Linux machine, something like 20,000 packets (of ~ 32 byte payload) should be attainable without any transmission buffer tricks, by simply using a sendto() call per message per client.

    Note that you want your server to connect to a switch using gigabit ethernet, before fanning out to the Arduino Ethernet shields (at 10/100Mbit/s). Otherwise the network speed and latencies on the server end will slow it down too much.

    Quote Originally Posted by C_programmer.C View Post
    2. The client is actually an Ardiuno Circuit. Arduino have enough capabilities to do what I have understood from your post?
    Arduinos do not have the capabilities for running the service (as most run at 16MHz, and have at most 2048 bytes of SRAM), but they should work fine as a client, receiving commands from some server.

    In this case, I would use UDP packets to communicate with the service (that runs on something more beefy than an Arduino). See the Arduino UDP example.

    Note that your own client code running on the Arduino must assign a fixed IP address and MAC address as explained in the documentation. On the server side, you can then identify each client by their IP address alone.

    (In any case you need an identifier on each client, so the service can determine which commands to send to that client.)

    Since I assume this will be used on a local network of some kind, the 10.x.y.z range and the 192.168.x.y ranges can be used for up to 16 million and 65 thousand clients, respectively. On internet, meaning a globally-accessible network, the security issues are pretty complicated, although it may be possible to develop a secure solution using per-client keys known on the server, and some symmetric ciphers, that might work on an Arduino.

    Quote Originally Posted by C_programmer.C View Post
    Again is there a possibility that Arduino can be programmed to adopt in a token ring?
    I don't understand. What do you mean? I thought you said you'd use Ethernet (10/100Mbit/s Arduino Ethernet shields, in particular).


    Can't you provide any specific details? It's bloody difficult trying to give you advice when only the smallest, scattered crumbs of relevant information is available.

    If I had to guess, I'd say you are doing something analogous to Project Blinkenlights, perhaps a control/sensor network or grid. There are more similar ideas and implementations than you can shake a stick at, so there is no reason to worry about someone "stealing your idea". To be honest, your idea is most likely not worth stealing, to anyone.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. fastest broadband ?
    By kryptkat in forum Tech Board
    Replies: 2
    Last Post: 11-20-2010, 08:49 AM
  2. Fastest Internet Ever?
    By hk_mp5kpdw in forum A Brief History of Cprogramming.com
    Replies: 70
    Last Post: 03-07-2008, 05:10 PM
  3. I'm looking for the fastest!
    By Yarin in forum Windows Programming
    Replies: 4
    Last Post: 11-07-2007, 03:30 PM
  4. Fastest STL container?
    By Shakti in forum C++ Programming
    Replies: 18
    Last Post: 02-17-2006, 02:07 AM
  5. Replies: 1
    Last Post: 10-09-2001, 10:20 PM