Thread: (VC++ related) Unicode or Multi-byte environment?

  1. #1
    Registered User
    Join Date
    Mar 2009
    Posts
    46

    (VC++ related) Unicode or Multi-byte environment?

    Background - I want to support Japanese and English in my program and I don't want to have to use Japanese-only character encoding with any files (source files or external text files). I also would like it if the program could run on computers without Japanese language support installed (obviously not displaying Japanese for those users ;-).

    Visual Studios allows you to set the character set used by the compiler (I guess similar options exist in other development suites?). The options are "Not Set" "Multi-byte Character Set" and "Unicode Character Set".

    For example:
    Code:
    GetModuleFileName(hInstance, path, sizeof(path))
    If the environment is "Unicode" then 'path' needs to be declared with wchar_t. If you use char then you get '\0' every other character in the string returned. If the environment is multi-byte then it seems to work just fine with char (but my computer has the Japanese charset Shift-JIS as the default).

    The character encoding used on the source files themselves does not seem to have any bearing on how this Environment setting works (Unicode(UTF-8) source files work just fine with "Multi-byte").

    Does this 'environment' setting only affect Windows API related functions?

    Bearing in mind that I want to be able to compile the program in Linux with at least one compiler, which would I be better off using - multi-byte or Unicode ?

  2. #2
    Registered User
    Join Date
    Mar 2009
    Posts
    46
    OK, I'm going to change/simplify my question in hopes of picking up a response quicker.

    I've got a big program that I'd like to work smoothly with both English and Japanese users. If I change the environment setting in VC++ to "Unicode" I will have to make a lot of code changes. (_TCHAR's everwhere for a start, I guess ;-). So what advantages would doing that bring me?

  3. #3
    Registered User
    Join Date
    May 2007
    Posts
    147
    While I have no experience in Kanji, I can say you'll need to study language independent development more to get this right.

    Search here or google for linux resources for Japanese language support in C to get an idea of what's required. Generally, as I understand it, UTF-8 is common, while Windows's notion of MCBS may not be as portable.

    In all language independent development you will need to consider the standard issues. TCHAR is a common solution for Windows, but you'll need to understand the form L"STRING" vs the _T("STRING") format for string literal processing.

    It is not just the OS API that's involved depending on what your application is doing. There are implications for SQL data I/O, and a whole host of issues that could fill a book on the subject.

    My point is that, yes, you should consider portability (and therefore your choices) now - and Unicode is the most widely supported approach. You will have to 'bite the bullet' with respect to literals and string I/O - though you may often choose to mix ASCII and wide character OS calls, you'll be better served to keep that to a minimum.

  4. #4
    Registered User
    Join Date
    Mar 2009
    Posts
    46
    Right. Well I think I've got a starting point now, so thank you. (And wish me luck).

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Unicode 2 byte wide characters
    By davo666 in forum C Programming
    Replies: 18
    Last Post: 02-22-2009, 05:11 AM
  2. brace-enclosed error
    By jdc18 in forum C++ Programming
    Replies: 53
    Last Post: 05-03-2007, 05:49 PM
  3. About aes
    By gumit in forum C Programming
    Replies: 13
    Last Post: 10-24-2006, 03:42 PM
  4. error: identifier "byte" is undefined.
    By Hulag in forum C++ Programming
    Replies: 4
    Last Post: 12-10-2003, 05:46 PM
  5. UNICODE and GET_STATE
    By Registered in forum C++ Programming
    Replies: 1
    Last Post: 07-15-2002, 03:23 PM