Thread: Speed/size advantage to SQL?

  1. #1
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300

    Speed/size advantage to SQL?

    I suppose this *might* be a better question on an actual web development board, but I know some of the regulars here actually feed themselves doing web programming, and just hang around cboard to "keep their chops up" or something, so your thoughts are appreciated.

    As part of my experiment comparing perl/Mason to ruby on rails, I'm taking a small "document search" app I wrote in RonR and redoing it more or less identically using Mason (but not catalyst...). So, now my hands are less tied with regard to the "model" side of MVC, I am wondering: what is the advantage to using SQL as opposed to text files or some other form of data storage?

    With the "docsearch" app, the documents are there already in HTML and I did not put them into an SQL db; the only thing I used the db for was as a subject-index and list of documents available to search, with paths, etc. So now I'm doing the Mason version and thinking that I could just (or almost) as easily keep these details in a text file. (That whole thing could be done in plain ol' CGI, but anyway...)

    Which leads me to my question: is SQL used because it is portable, an industry standard, easy and efficient to use, or does it have real performance advantages? When I get to the point where I want to do something more complex (like forum software), is there some magic to SQL which will mean it is the only real choice, performance and storage space* wise? Or could a system that used directories of text files and small serialized data structure stores be roughly equivalent?

    * I would assume the answer to the space question has to be no, since AFAICT there is no data compression used in SQL.
    Last edited by MK27; 06-09-2009 at 07:41 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  2. #2
    and the hat of copycat stevesmithx's Avatar
    Join Date
    Sep 2007
    Posts
    587
    Flat File Database Design vs. Relational Database Design | Database Solutions for Microsoft Access | databasedev.co.uk and
    https://www.verio.com/support/docume...fm?doc_id=4032
    explains why flat files should not be used(or should be used in some contexts alone).

    Plus, things like ACID - Wikipedia, the free encyclopedia are very difficult to implement for flat files.
    Not everything that can be counted counts, and not everything that counts can be counted
    - Albert Einstein.


    No programming language is perfect. There is not even a single best language; there are only languages well suited or perhaps poorly suited for particular purposes.
    - Herbert Mayer

  3. #3
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by stevesmithx View Post
    explains why flat files should not be used(or should be used in some contexts alone).
    Thanks for that Steve, but that is not exactly what I'm asking, since both an SQL database and a text-file *could* be implemented in a "flat-file" way.

    By "set of directories and text-files" I meant something that would be accessed in a "relational" way by the app, not just one mega file that I would parse line by line and turn into a data structure.

    So I guess this would come down to the details. I was just trying to fish around and see if there is not some "beyond the obvious" aspect to SQL. There does not have to be; I can understand it's popularity purely on the basis of its ease-of-use, conformance to a certain standard, and universality. In fact, I probably will stick to it because of that...
    Last edited by MK27; 06-09-2009 at 08:59 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  4. #4
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    is SQL used because it is portable, an industry standard, easy and efficient to use, or does it have real performance advantages?
    Are you talking about SQL, or RDBMS? SQL is just the language used to manipulate and retrieve data from most relational databases.

    A good relational database has several advantages over a standard text file. It will have built in journaling so that your data doesn't get corrupted if the application crashes, or the system loses power. It abstracts the data access away so that you don't need to handle situations where your dataset would take more memory that your application has available to it. It can index your data any way imaginable so that queries are faster.

    If you are just talking about SQL syntax, then it's mainly just a standard way to access data. I don't believe the language itself has any particular properties that make it especially efficient. It fact it is probably less efficient than a proprietary query mechanism that is customized to your particular data.

  5. #5
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by bithub View Post
    Are you talking about SQL, or RDBMS? SQL is just the language used to manipulate and retrieve data from most relational databases.
    I suppose I am talking about using "an SQL based RDBMS", if that makes sense (so, SQlite == "an SQL based RDBMS"). Or (teehee) did you mean ROUS's?

    A good relational database has several advantages over a standard text file. It will have built in journaling so that your data doesn't get corrupted if the application crashes, or the system loses power. It abstracts the data access away so that you don't need to handle situations where your dataset would take more memory that your application has available to it. It can index your data any way imaginable so that queries are faster.
    Now that's info. I guess these are questions to consider...are these things that would be (for example) common to "all systems accessing an SQlite database" or would they be dependent on the language library used?

    If you are just talking about SQL syntax, then it's mainly just a standard way to access data. I don't believe the language itself has any particular properties that make it especially efficient. It fact it is probably less efficient than a proprietary query mechanism that is customized to your particular data.
    Yeah, that's kind of what I am thinking. If I can organize text files in an effective way (so we minimize real memory usage), I can do whatever I want in the mason code vis, accessing and searching these text files. The disadvantage to that is it means the implementation will be completely custom, not standardized at all. Which, as you imply, there could be advantages to using such a customized mechanism if standardization is not a priority*. However, it could be that this "advantage" is chimeric because of some underlying algorithm enforced by 3NF SQL-based RDBM's...there's a further distillation of the question. Or I am now off my rocker.

    * because to the concern: "What if some dumb ape has to maintain your code later?" I say: please don't let them

    ps. that ROUS parable may be a good one, I'm not sure yet...
    Last edited by MK27; 06-09-2009 at 10:44 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  6. #6
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,412
    Quote Originally Posted by MK27
    SQlite == "an SQL based RDBMS"
    Might be better to use proper subset notation.

    Quote Originally Posted by MK27
    I guess these are questions to consider...are these things that would be (for example) common to "all systems accessing an SQlite database" or would they be dependent on the language library used?
    Generally, the features cited are not dependent on the programming interface.

    Quote Originally Posted by MK27
    If I can organize text files in an effective way (so we minimize real memory usage), I can do whatever I want in the mason code vis, accessing and searching these text files. The disadvantage to that is it means the implementation will be completely custom, not standardized at all.
    Another disadvantage is that you have to spend time writing the code to do all that, along with writing tests, debugging, etc. Furthermore, if you are not an expert at this, in the end you might not actually beat well tested and optimised database software that is used appropriately.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  7. #7
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by laserlight View Post
    Might be better to use proper subset notation.
    ???


    Quote Originally Posted by laserlight View Post
    Generally, the features cited are not dependent on the programming interface.
    So -- just to make sure I understand -- those *would be* features of SQlite, and therefore any lang. lib. interface which makes use of it?

    Another disadvantage is that you have to spend time writing the code to do all that, along with writing tests, debugging, etc. Furthermore, if you are not an expert at this, in the end you might not actually beat well tested and optimised database software that is used appropriately.
    I'm not sure if the amount of coding has to be that much more. And: I am unlikely to become an expert unless I am a neophyte first. I don't need to "beat the standard"; I am just afraid of ending up in a situation where I cannot possibly even compare to it (meaning the implementation is junk).
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  8. #8
    Registered User
    Join Date
    Sep 2004
    Location
    California
    Posts
    3,268
    sqlite has all of the features I mentioned. sqlite also has the added benefit that it can be embedded in applications instead of just running as a client/server architecture. In fact, I think that sqlite is usually the best database solution for client applications in general.

  9. #9
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,412
    Quote Originally Posted by MK27
    ???
    As in SQLite is an example of "an SQL based RDBMS" rather than SQLite is equal to "an SQL based RDBMS".

    Quote Originally Posted by MK27
    So -- just to make sure I understand -- those *would be* features of SQlite, and therefore any lang. lib. interface which makes use of it?
    I am reluctant to say an outright "yes" because it is conceivable that a horribly broken library can prevent you from using those features through it, but consider that those features are available with the use of SQLite's command line tool.

    Quote Originally Posted by MK27
    I am unlikely to become an expert unless I am a neophyte first.
    Of course, but here I am assuming that you are actually going to implement commercial software with concerns such as a deadline to complete the project.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  10. #10
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by laserlight View Post
    Of course, but here I am assuming that you are actually going to implement commercial software with concerns such as a deadline to complete the project.
    I'm gonna turn vbulletin,etc, into quaint anachronisms...

    Last edited by MK27; 06-09-2009 at 11:47 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  11. #11
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,412
    Quote Originally Posted by MK27
    I'm gonna turn vbulletin,etc, into an quaint anachronisms...
    Well... after you complete the project, you may still have to maintain it, add new features not originally in the specification, etc
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  12. #12
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by laserlight View Post
    Well... after you complete the project, you may still have to maintain it, add new features not originally in the specification, etc
    Not everyone will like the available color schemes, but they will just have to succumb in light of the astonishing functionality
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  13. #13
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    If you implement your own database, you will have to work much harder to combine data from several tables. This is one of the main features of a relational database (whether it use SQL or another query language): You can form new (temporary) tables from a set of existing tables.

    A rule for databases (and many other things in programming) is that "data should only be in one place" - so you should for example not put user-names in the forum list (that way, if the moderators decide I should be called StamP instead, you don't have to find 15000+ posts and modify each one of them - instead, we store a numeric user-id that won't ever change).

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  14. #14
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by matsp View Post
    A rule for databases (and many other things in programming) is that "data should only be in one place" - so you should for example not put user-names in the forum list (that way, if the moderators decide I should be called StamP instead, you don't have to find 15000+ posts and modify each one of them - instead, we store a numeric user-id that won't ever change).
    Yeah, that is a Very Significant Point. Thanks for bringing it, matsp. "MKbulletin" is gonna be the tish in 2012...
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  15. #15
    (?<!re)tired Mario F.'s Avatar
    Join Date
    May 2006
    Location
    Ireland
    Posts
    8,446
    Quote Originally Posted by MK27 View Post
    Yeah, that's kind of what I am thinking. If I can organize text files in an effective way (so we minimize real memory usage), I can do whatever I want in the mason code vis, accessing and searching these text files.
    Assuming you are successful in achieving this, along with all the necessary strategies and routines that you will no doubt soon find yourself being forced to develop in order to speed data searching, manipulation and retrieval, I hope you realize you developed a database engine

    So you will in fact be "reinventing the wheel" in the sense that your objective was not that, but you ended up resorting to the exact same core features that define a relational database (or assuming you don't go with a relational model, an hierarchical database).

    Will it help if you look at databases as simple data repositories (pretty much like your flat files) that offer indexing mechanisms and a full engine to help handle data definition and manipulation?
    Originally Posted by brewbuck:
    Reimplementing a large system in another language to get a 25% performance boost is nonsense. It would be cheaper to just get a computer which is 25% faster.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. OOP Question DB Access Wrapper Classes
    By digioz in forum C# Programming
    Replies: 2
    Last Post: 09-07-2008, 04:30 PM
  2. Please help create a loop for embedded SQL
    By cjohnman in forum C Programming
    Replies: 4
    Last Post: 04-24-2008, 06:46 AM
  3. Embedded SQL
    By sarac in forum C Programming
    Replies: 1
    Last Post: 05-04-2006, 09:09 AM
  4. Replies: 1
    Last Post: 03-21-2006, 07:52 AM
  5. Problem with embedded SQL in C/C++ (ECPG)
    By NeuralClone in forum C Programming
    Replies: 4
    Last Post: 10-21-2005, 05:16 PM

Tags for this Thread