(VC++ related) Unicode or Multi-byte environment?

**PaulBlay** · 05-22-2009

Background - I want to support Japanese and English in my program and I don't want to have to use Japanese-only character encoding with any files (source files or external text files). I also would like it if the program could run on computers without Japanese language support installed (obviously not displaying Japanese for those users ;-).

Visual Studios allows you to set the character set used by the compiler (I guess similar options exist in other development suites?). The options are "Not Set" "Multi-byte Character Set" and "Unicode Character Set".

For example:

Code:

GetModuleFileName(hInstance, path, sizeof(path))

If the environment is "Unicode" then 'path' needs to be declared with wchar_t. If you use char then you get '\0' every other character in the string returned. If the environment is multi-byte then it seems to work just fine with char (but my computer has the Japanese charset Shift-JIS as the default).

The character encoding used on the source files themselves does not seem to have any bearing on how this Environment setting works (Unicode(UTF-8) source files work just fine with "Multi-byte").

Does this 'environment' setting only affect Windows API related functions?

Bearing in mind that I want to be able to compile the program in Linux with at least one compiler, which would I be better off using - multi-byte or Unicode ?

**PaulBlay** · 05-22-2009

OK, I'm going to change/simplify my question in hopes of picking up a response quicker.

I've got a big program that I'd like to work smoothly with both English and Japanese users. If I change the environment setting in VC++ to "Unicode" I will have to make a lot of code changes. (_TCHAR's everwhere for a start, I guess ;-). So what advantages would doing that bring me?

**JVene** · 05-22-2009

While I have no experience in Kanji, I can say you'll need to study language independent development more to get this right.

Search here or google for linux resources for Japanese language support in C to get an idea of what's required. Generally, as I understand it, UTF-8 is common, while Windows's notion of MCBS may not be as portable.

In all language independent development you will need to consider the standard issues. TCHAR is a common solution for Windows, but you'll need to understand the form L"STRING" vs the _T("STRING") format for string literal processing.

It is not just the OS API that's involved depending on what your application is doing. There are implications for SQL data I/O, and a whole host of issues that could fill a book on the subject.

My point is that, yes, you should consider portability (and therefore your choices) now - and Unicode is the most widely supported approach. You will have to 'bite the bullet' with respect to literals and string I/O - though you may often choose to mix ASCII and wide character OS calls, you'll be better served to keep that to a minimum.

**PaulBlay** · 05-22-2009

Right. Well I think I've got a starting point now, so thank you. (And wish me luck).

Thread: (VC++ related) Unicode or Multi-byte environment?

Thread Tools

Search Thread

Display

(VC++ related) Unicode or Multi-byte environment?

Similar Threads

Unicode 2 byte wide characters

brace-enclosed error

About aes

error: identifier "byte" is undefined.

UNICODE and GET_STATE