Thread: some suggestions to speed up code

  1. #1
    Registered User
    Join Date
    Jan 2008
    Posts
    569

    some suggestions to speed up code

    So I am asked to speed up some code, I am storing a file which I am going to compile into a big dictionary. Right now I am using a linked list to store the words in order.. but what data structures should I use to make it faster? A binary tree or a hash table? opinions?

  2. #2
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Depends on access patterns.

    If no one ever accesses the data, it doesn't matter how you store it.

    If it is accessed in the order you are storing it, changing that order will only hurt you.

    If it is accessed in a random order, and you have to search through it to find anything, then you need a better indexing, storage or access mechanism.
    Mainframe assembler programmer by trade. C coder when I can.

  3. #3
    Registered User
    Join Date
    Jan 2008
    Posts
    569
    it is accessed after all is stored

  4. #4
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    And the sky is blue!

    Most data won't be accessed until it is "stored" into its structures.
    Mainframe assembler programmer by trade. C coder when I can.

  5. #5
    Registered User
    Join Date
    Jan 2008
    Posts
    569
    well here's how it goes.. I have a file of words the file has words in it and there are two types of it.. one is sorted in alphabetical and the other one is not.. our job is to construct a dictionary of words and when we create it we can request the dictionary to keep the dictionary in sorted order or not

  6. #6
    Technical Lead QuantumPete's Avatar
    Join Date
    Aug 2007
    Location
    London, UK
    Posts
    894
    Quote Originally Posted by -EquinoX- View Post
    we can request the dictionary to keep the dictionary in sorted order or not
    What's the point of having an unsorted dictionary?

    The point here is, what do you want to do with the dictionary once it's created? A binary tree is probably your best bet here, but it does very much depend on what you want to do.

    QuantumPete
    "No-one else has reported this problem, you're either crazy or a liar" - Dogbert Technical Support
    "Have you tried turning it off and on again?" - The IT Crowd

  7. #7
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    > What's the point of having an unsorted dictionary?
    If you're just looking for a simple yes/no "is this word in the dictionary", then perhaps a hash would work well.

    But any kind of searching would be better using some kind of tree.

    > A binary tree or a hash table? opinions?
    Consider a tree where each node say has pointers for 'a' to 'e', 'f' to 'j', 'k' to 'o' etc.
    Somewhere between a strict binary tree, and having 26 pointers for every possible next letter, where many such combinations are in fact useless.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  8. #8
    Registered User
    Join Date
    Jan 2008
    Posts
    569
    It's not a dictionary.. so first I read a I read a list of words and put it into some data structure
    secondly I read a different file, which is also a list of words and then compare that words to see if any matches the word in the first file if it does then I copy that word to a data structure.. thirdly I would iterate through previous data structure and put the list of the words in a new data structure so that the list of the words would be sorted.. that's all I need to do basically

    so what is the most efficient and fast way to do this? what data structure should I use the first time, second time, and third time?

    What I have in mind right now is first just to store the first file in a data structure, the second one in a binary tree, and the third one in a binary tree too.. I don't know if it's fast enough

  9. #9
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    1. What is your time limit for the execution.
    2. If you don't know which method to use, make sure you isolate the storage from the code that fetches/stores data in the tree/list/etc.
    3. Start with a very simple method of storing your data, such as a straight, unsorted array, or a simple linked list.
    4. If the time is "too long", then adjust your storage method.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  10. #10
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    What about just indexing your words? Then keep them in sorted order, and use a binary search. The index would be an array of structs, something like this:

    0) 0, 24
    1) 25, 44
    2) 45, 53

    etc.
    Where Index[0] = low and high index numbers for the letter 'a' words in your dictionary.
    Index[1] = low and high index numbers for the letter b words in your dictionary.

    etc.

    Now you get a request for "beach", is it in your dictionary? Your binary search will begin with the 25th element of your dictionary array, and have an initial ceiling value of 44. That changes a binary search (already fast), to a supercharged binary search.

  11. #11
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    I still think this smells of "premature optimization". The basic principle should be: make the code work, using the simplest reasonable methods. If you need to make it run faster when you have it working, then make changes to the working code.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  12. #12
    Registered User
    Join Date
    Jan 2008
    Posts
    569
    I have a code that already work and right now it's using a linked list to store the words sorted, which is bad , bad , and bad! so I am thinking using hash table for the first one and use a binary tree for the second one as the file will be smaller after we find a commonly words

  13. #13
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by -EquinoX- View Post
    I have a code that already work and right now it's using a linked list to store the words sorted, which is bad , bad , and bad! so I am thinking using hash table for the first one and use a binary tree for the second one as the file will be smaller after we find a commonly words
    It may be bad from a performance perspective, but the performance aspect is only really important if you have code that runs sufficiently slow that there's any point in making it faster.

    Also, linked lists are great for inserting/removing stuff from.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  14. #14
    Registered User
    Join Date
    Jan 2008
    Posts
    569
    yes but it's bad and I need to improve it somehow

  15. #15
    Jack of many languages Dino's Avatar
    Join Date
    Nov 2007
    Location
    Chappell Hill, Texas
    Posts
    2,332
    Define "bad"
    Mainframe assembler programmer by trade. C coder when I can.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 1
    Last Post: 03-21-2006, 07:52 AM
  2. Flight Simulator Wind Speed!!
    By Dilmerv in forum C++ Programming
    Replies: 6
    Last Post: 03-20-2006, 12:40 AM
  3. Binary Search Trees Part III
    By Prelude in forum A Brief History of Cprogramming.com
    Replies: 16
    Last Post: 10-02-2004, 03:00 PM
  4. Seems like correct code, but results are not right...
    By OmniMirror in forum C Programming
    Replies: 4
    Last Post: 02-13-2003, 01:33 PM
  5. Replies: 4
    Last Post: 01-16-2002, 12:04 AM