Thread: Newbie - Extend Python with C GeoSearch

  1. #1
    Registered User
    Join Date
    Feb 2009
    Posts
    93

    Newbie - Extend Python with C GeoSearch

    Hi,

    I am a newbie but I think my question is quite advanced. I am trying to create my own geocoder complete with autosuggest using open source data in a custom format.

    If you go to Monster.co.uk and start typing in a location in the location box it not only autosuggests, but it also will then geocode the result. There are plenty of free services but they all have problems, Google for example require you to show a map and brand the search box.

    My site is in Python, but for speed I was thinking of having a C module written to search and return results to the user. The format would be along the lines of:

    City Town Country Long/Lat

    How much work is this? Is it worth it? I really want to run my own geocoding service and this is the best way I can think about doing it.

  2. #2
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Off hand, it seems like long and lat would be just a couple more entries in the struct that also holds the city, town, and country. So, not much work to it.

    For a simple tree search, or binary search of an array, I wouldn't bother -- file I/O won't be much faster at all in C, but these things seem to always grow larger, and speed is a beautiful thing.

    And for that, you can't beat C.

  3. #3
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    28,413
    I'm not familiar with this "GeoSearch" thing, but I have heard of GeoDjango, so maybe you could see if it can be adapted to your needs and works fast enough. It seems to me that the speed will be a matter of database optimisation, and the database engine would likely be written in C (or C++) anyway.
    Quote Originally Posted by Bjarne Stroustrup (2000-10-14)
    I get maybe two dozen requests for help with some sort of programming or design problem every day. Most have more sense than to send me hundreds of lines of code. If they do, I ask them to find the smallest example that exhibits the problem and send me that. Mostly, they then find the error themselves. "Finding the smallest program that demonstrates the error" is a powerful debugging tool.
    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  4. #4
    Registered User
    Join Date
    Feb 2009
    Posts
    93
    So I basically want a screaming fast system which works as follows:

    User enters a location, say "Lond" in the browser, this query is sent to my geocoding software, possibly running on another server. The server returns the result "London, England" as a suggestion via JSON.

    When the user selects this option by clicking on the field, the Geocoder then returns the Long/Lat co-ordinates which are sent to my search server.

    My data is currently in txt format. For performance reasons, am I better importing this into a SQL or NON-SQL database, or instead perhaps putting it in a simple CSV file?

  5. #5
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    For speed, keep it as local as you can. For 50,000 cities or less, I'd be very tempted to code my own, using a simple data file and binary search, but I can't advise what's good for your server. SQLite is pretty amazing, small, quick, and open source. I doubt if you could go wrong running it locally, with any number of cities.

    If the cities and their long/lat data could be put into an array and left in memory, the response could be nearly instantaneous with a binary search. Network lag and disk I/O would be the two bottlenecks you want to avoid.

  6. #6
    Registered User
    Join Date
    Feb 2009
    Posts
    93
    Hi.

    I have two main questions going forward. The first is concurency. If I am running this on a server and serving requests via JSON, how does the program work with concurrency? For example, if the program takes 1s to execute, and three people query within 1s, what happens? Do I need to build in a queing system?

    Secondly, if I want it to work on Town, City and Country level and have it return the co-ordinates, I was thinking it makes sense to have a tree structure like this:

    England (0.1212 / -0.12121)
    --- Kent (0.1212 / -0.12121)
    --------- Orpington (0.1212 / -0.12121)
    --------- Chatam (0.1212 / -0.12121)
    --------- Rochester (0.1212 / -0.12121)
    --------- Dover (0.1212 / -0.12121)
    --------- Edenbridge (0.1212 / -0.12121)
    --- Wiltshire (0.1212 / -0.12121)
    --------- Swindon (0.1212 / -0.12121)
    --------- Malmsbury (0.1212 / -0.12121)

    Here are the kind of autosuggest conditions I am hoping to achieve:

    When someone types "E" I would like it to return: "England" and "Edenbridge, Kent, England"

    When someone types "K" I would like it to return "Kent, England"

    If someone types in "Kent" it would assume it to be "Kent England"
    Now, I am in over my head here. But would anyone mind confirming that this is logical and an optimal way of doing it?

    Should I be looking at binary tree searching for this?
    http://en.literateprograms.org/Binary_search_tree_(C)
    Last edited by spadez; 07-30-2012 at 05:16 AM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Python --> C
    By Macha in forum C Programming
    Replies: 71
    Last Post: 05-28-2010, 06:18 PM
  2. Extend parser functionality without realloc
    By Queue in forum C Programming
    Replies: 9
    Last Post: 09-17-2006, 10:22 PM
  3. How can I extend the text size limit of TMemo from C++ Builder?
    By Unregistered in forum Windows Programming
    Replies: 5
    Last Post: 06-27-2002, 02:24 AM
  4. how to extend a template class?
    By dkt in forum C++ Programming
    Replies: 7
    Last Post: 03-04-2002, 04:43 AM
  5. human rights - how far do they extend
    By iain in forum A Brief History of Cprogramming.com
    Replies: 8
    Last Post: 01-20-2002, 09:08 AM