Thread: Hashtables & Unicode

  1. #1
    int x = *((int *) NULL); Cactus_Hugger's Avatar
    Join Date
    Jul 2003
    Location
    Banks of the River Styx
    Posts
    902

    Hashtables & Unicode

    My current great work is in need of two things, hashtables and unicode strings. I was wondering if the forum had any good ideas on these, or perhaps knowledge of some libraries to that effect. (For C++)

    Currently I'm getting hashtables from STLPort, but STLPort is a fairly "large" dependency. But it works.

    For unicode, I'm looking for something akin to std::string with the support for UTF-8 (UTF-16 might work too... something that can output to both would be great.) I'd so far found the ICU (which I've yet to get to compile for MinGW... seems more than './configure; make' is needed, but I might try again from scratch) and glib (/glibmm), which also falls in my 'large dependency' category.

    The project also uses boost, so... boost+STLPort+glib = wow, lots of stuff. But if that's what it takes. (thinking of it, I wonder if boost has unicode strings...)
    I could write hashtables/unicode strings myself, but anything not written by me would probably be a lot more stable and (moreso) feature rich.
    long time; /* know C? */
    Unprecedented performance: Nothing ever ran this slow before.
    Any sufficiently advanced bug is indistinguishable from a feature.
    Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
    The best way to accelerate an IBM is at 9.8 m/s/s.
    recursion (re - cur' - zhun) n. 1. (see recursion)

  2. #2
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,897
    >but STLPort is a fairly "large" dependency
    Google's sparse hash might be a good alternative.

    >I'd so far found the ICU
    That's my recommendation.

    >I wonder if boost has unicode strings...
    Not in the way that you're thinking, I suspect.

    >I could write hashtables
    Yep, and it's not terribly difficult.

    >unicode strings myself
    I don't recommend this. While this may seem relatively simple on the surface, if you want something remotely interoperable you'll be getting into normalization, which is tricky at best.
    My best code is written with the delete key.

  3. #3
    int x = *((int *) NULL); Cactus_Hugger's Avatar
    Join Date
    Jul 2003
    Location
    Banks of the River Styx
    Posts
    902
    icu... (which I've yet to get to compile for MinGW... seems more than './configure; make' is needed, but I might try again from scratch)
    Ok, thought I'd add this note here, just to make another place for someone looking for this solution... (I had to find it in multiple places...)

    If you are using Mingw/Msys/Vista, ICU may not initially build. (It will fail during 'make' with an error from '/usr/bin/install' about 'Permission denied.')
    This is Vista's fault - it is requiring install to have admin privileges to run, due to its filename. You can work around this "feature" by adding "manifest" files to your Msys's bin directory, they are available here. Additionally, you may need to "touch" the executables, ie, run "touch install.exe" (Vista caches the manifests and doesn't sufficiently check if they've been changed.)

    Even after all this, I'm still getting stuck. The ICU makefile enters the makefile in the data/ dir, and gets stuck. This is what seems to be hanging it up:
    Code:
    # The #M# is used to delete lines for icu-config
    # Current full path directory.
    #CURR_FULL_DIR=$(shell pwd -W)#M# for MSYS
    CURR_FULL_DIR=$(subst \,/,$(shell cmd /c cd | tail --bytes=+3))#M# for Cygwin shell
    # Current full path directory for use in source code in a -D compiler option.
    #CURR_SRCCODE_FULL_DIR=$(subst /,\\\\,$(shell pwd -W))#M# for MSYS
    CURR_SRCCODE_FULL_DIR=$(subst \,/,$(shell cmd /c cd | tail --bytes=+3))#M# for Cygwin shell
    ...but this also seems to get included in other makefiles, but the one in data locks up here. It doesn't start executing commands, running "make -nd" gives:
    Code:
    GNU Make version 3.79.1, by Richard Stallman and Roland McGrath.
    Built for i686-pc-msys
    Copyright (C) 1988, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 2000
            Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.
    There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
    PARTICULAR PURPOSE.
    
    Report bugs to <[email protected]>.
    
    Reading makefiles...
    Reading makefile `Makefile'...
    Reading makefile `../icudefs.mk' (search path) (no ~ expansion)...
    Reading makefile `../config/mh-mingw' (search path) (no ~ expansion)...
    And it freezes there. I commented out the two trouble lines, replaced the one usage of them in the main ./data/Makefile with what I thought its value should be (minus shell magic) and ./data built. So, now I've just got to see if I can get the rest of this to build...

    Edit: I didn't even notice the Msys lines in there. Why are those commented... I traded the Cygwin lines for the Msys ones, and my build works. No idea why those lines are there. (There is a separate cygwin file...)
    Only problem left is icu seems to compile with "-g -O2", despite being set to release and not debug.
    Edit: The build also doesn't seem to produce import libraries, only dll files... >.>
    Last edited by Cactus_Hugger; 07-25-2008 at 07:43 PM.
    long time; /* know C? */
    Unprecedented performance: Nothing ever ran this slow before.
    Any sufficiently advanced bug is indistinguishable from a feature.
    Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
    The best way to accelerate an IBM is at 9.8 m/s/s.
    recursion (re - cur' - zhun) n. 1. (see recursion)

  4. #4
    Registered User Codeplug's Avatar
    Join Date
    Mar 2003
    Posts
    4,981
    What are you doing with these strings that require ICU?

    gg

  5. #5
    int x = *((int *) NULL); Cactus_Hugger's Avatar
    Join Date
    Jul 2003
    Location
    Banks of the River Styx
    Posts
    902
    Quote Originally Posted by Codeplug View Post
    What are you doing with these strings that require ICU?
    1) Hashtables
    2) ICU Unicode strings
    3) ???
    4) PROFIT!!!

    (aside: the hashtables & strings are unrelated, the above is a joke...) They're being used in a sort of sand-box-ish project I have. It's more of a library of interesting helpful things for SDL... (random and unrelated but reusable code).
    SDL delivers (or, it can, if you tell it to) keyboard input as UTF-16 values. Normally, this is irrelevant, as you're just listening for up/down/left/right/etc. Occaisionally, textual input is required, which let to me first writing a method to translate SDLK_* defines to text. Thinking this was a bit hackish, I started looking into the unicode field, and wondered if I should/could use that. That led to a tangent of "is there a unicode library?" I'd heard of ICU from somewhere... and libxml++ uses glibmm.
    I think I'm going to play around with ICU, see what it can do.
    long time; /* know C? */
    Unprecedented performance: Nothing ever ran this slow before.
    Any sufficiently advanced bug is indistinguishable from a feature.
    Real Programmers confuse Halloween and Christmas, because dec 25 == oct 31.
    The best way to accelerate an IBM is at 9.8 m/s/s.
    recursion (re - cur' - zhun) n. 1. (see recursion)

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. <string> to LPCSTR? Also, character encoding: UNICODE vs ?
    By Kurisu33 in forum C++ Programming
    Replies: 7
    Last Post: 10-09-2006, 12:48 AM
  2. Unicode - a lot of confusion...
    By Jumper in forum Windows Programming
    Replies: 11
    Last Post: 07-05-2004, 07:59 AM
  3. Should I go to unicode?
    By nickname_changed in forum C++ Programming
    Replies: 10
    Last Post: 10-13-2003, 11:37 AM
  4. UNICODE and windows.h help
    By nextus in forum Windows Programming
    Replies: 3
    Last Post: 03-02-2003, 03:13 PM
  5. UNICODE and GET_STATE
    By Registered in forum C++ Programming
    Replies: 1
    Last Post: 07-15-2002, 03:23 PM