Thread: A Binary Search Tree of... Binary Search Trees...

  1. #1
    Devil's Advocate SlyMaelstrom's Avatar
    Join Date
    May 2004
    Location
    Out of scope
    Posts
    4,079

    A Binary Search Tree of... Binary Search Trees...

    Ok, to start off let me say I don't know what I'm talking about and I may or may not have just ingested an illegal substance that impaired my ability to fully understand what the penguin was telling me about Binary Search Trees. Alright, none of that's true, but regardless, I don't know much about Binary Search Trees and everything I do know is in theory, not in code, so if you post some code, my head might explode trying to read it. Just a warning. ... by the way, to the moderators, if you feel this is more appropriate in another forum, sorry and please switch it to where you like.

    Ok, anyway... I was wondering if this would be a believable way of sorting or if it's just so ridiculously unnecessary that I should just give up on the idea of it.

    Let's say you have data such as email addresses which could be broken down into logical sub-strings (the web address and the user). Could you make a Tree that basically searches the web addresses and when it finds the appropriate one, points to another tree of only the users in that address. Would this be more efficient or less efficient? Is it even possible (it seems possible, it's really just a struct with a string, a left pointer, right pointer, and pointer to another root)?
    Sent from my iPadŽ

  2. #2
    Anti-Poster
    Join Date
    Feb 2002
    Posts
    1,401
    It seems less efficient to me. Which would you rather do: one large O(log n) search or two smaller O(log n) searches? It seems to me like you get the most benefit from the tree when searching large amounts of data.

    However, if you wanted to be able to search for an email address by only the domain name (and return a list of the users with that domain name), then that sounds like one alternative. It seems somewhat unfeasible because you'd need a tree containing all the domain names and a tree for each domain name containing the users. If you wanted to also allow searching by user name, then you'd need a tree containing all the user names and a tree for each user name containing the domain names. That's a lot of duplicate information floating around in memory.
    If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein

  3. #3
    Devil's Advocate SlyMaelstrom's Avatar
    Join Date
    May 2004
    Location
    Out of scope
    Posts
    4,079
    Alright, in a chatroom, Piano and I just figured that the seperate trees would actually be slightly worse efficiency because of it's worst case.

    Basically at worst case the single tree would take 10 passes through the tree at worst case, while the seperate tree would take 11 passes (assuming 10 domains with 100 users all evenly distributed) and in both models it seems exactly 512 users would sit 9 passes away.

    We're now trying to figure out a different model though in which you would somehow sort the domain tree (no longer a binary search tree) in a method that you would have an uneven distribution of users (more likely) and the highest user lists would sit on the top of the tree. Can't quite figure how you'd transverse this list though.
    Last edited by SlyMaelstrom; 12-10-2005 at 11:45 AM.
    Sent from my iPadŽ

  4. #4
    Anti-Poster
    Join Date
    Feb 2002
    Posts
    1,401
    Quote Originally Posted by SlyMaelstrom
    We're not trying to figure out...
    He means that we are trying.
    If I did your homework for you, then you might pass your class without learning how to write a program like this. Then you might graduate and get your degree without learning how to write a program like this. You might become a professional programmer without knowing how to write a program like this. Someday you might work on a project with me without knowing how to write a program like this. Then I would have to do you serious bodily harm. - Jack Klein

  5. #5
    Devil's Advocate SlyMaelstrom's Avatar
    Join Date
    May 2004
    Location
    Out of scope
    Posts
    4,079
    Quote Originally Posted by pianorain
    He means that we are trying.
    We're now trying to figure out.
    Sent from my iPadŽ

  6. #6
    aoeuhtns
    Join Date
    Jul 2005
    Posts
    581
    We're now trying to figure out a different model though in which you would somehow sort the domain tree (no longer a binary search tree) in a method that you would have an uneven distribution of users (more likely) and the highest user lists would sit on the top of the tree. Can't quite figure how you'd transverse this list though.
    This seems hard, unnecessarily complicated, and not necessarily an improvement to efficiency.

    I think your best solution is to just change your sorting algorithm. Sort by the domain name first. For example, you might represent your email addresses as:
    Code:
    struct email_addr {
        std::string domain;
        std::string name;
        bool operator<(const email_addr& foo) {
            return (domain < foo.domain) || (domain == foo.domain  &&  name < foo.name);
        }
    };
    Then you can efficiently find the first email address with that domain name. To find all email addresses with the domain name, just find the first and then iterate forward until you get a different domain name.

    This gives the same order of efficiency as the datastructure you're proposing, assuming that iterating is on average O(1) operation.

    One caveat with this approach is that you're storing a lot of redundant domain names. If this is really important to you, you could always use a pointer to a string, and then, on insertion, check neighboring nodes to see if the domain name is redundant or not, and on deletion, check neighboring nodes to see if you need to free the pointer. Not fun. And depending on the implementation of std::string, that's probably not the best solution.

    In fact, ignore everything I've said up to this point.

    It's probably simplest to just use an algorithm that rotates a string about the at sign. Pardon any incorrect C++ (with templates); I'm kind of rusty.
    Code:
    template <class iter, class value_type>
    bool rotate_about(iter beg, iter end, value_type ch) {
        iter pivot
            = std::find(beg, end, ch);
    
        if (pivot == end) {
            return false;
        }
    
        std::reverse(beg, pivot);
        std::reverse(pivot + 1, end);
        std::reverse(beg, end);
        return true;
    }
    Call rotate_about(s.beg(), s.end(), '@') before inserting email addresses (or call some equivalent procedure), and then use ordinary string comparison, with an ordinary std::set. You can use myset.lower_bound(domain) to get the first domain name, and then iterate forward to get further email addresses with that domain name. If you want to search just by user names of a particular domain name, you'll have to do it by constructing a string with the domain name, first.
    Last edited by Rashakil Fol; 12-10-2005 at 02:30 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Binary Search Trees
    By lorannex in forum C Programming
    Replies: 3
    Last Post: 04-25-2009, 06:24 AM
  2. Converting From Binary Tree to Threaded Binary Trees
    By elton_fan in forum C Programming
    Replies: 15
    Last Post: 11-08-2007, 11:41 PM
  3. Binary Search Tree - one last question
    By tms43 in forum C++ Programming
    Replies: 2
    Last Post: 11-14-2006, 03:58 AM
  4. binary search tree
    By unregistered in forum C++ Programming
    Replies: 4
    Last Post: 12-11-2001, 11:42 AM
  5. BST/Red and Black Tree
    By ghettoman in forum C++ Programming
    Replies: 0
    Last Post: 10-24-2001, 10:45 PM