How to concatenate, then widen string literals

This is a discussion on How to concatenate, then widen string literals within the C Programming forums, part of the General Programming Boards category; I have a useful macro: #ifdef _UNICODE #define TEXT(x) L##x #else #define TEXT(x) x #endif Ie using the L prefix ...

  1. #1
    Registered User
    Join Date
    Aug 2004
    Posts
    7

    Cool How to concatenate, then widen string literals

    I have a useful macro:
    #ifdef _UNICODE
    #define TEXT(x) L##x
    #else
    #define TEXT(x) x
    #endif
    Ie using the L prefix to 'widen' the string literals. This part alone works great.

    Now, I want to do this:
    TEXT("foo" "bar" "baz")
    which ideally, #ifdef _UNICODE, would yield:
    L"foobarbaz"
    which is what I want out of all this - a concatenated, wide string.

    The problem is that C preprocessors expand the TEXT macro and do the token-pasting first, yielding this:
    L"foo" "bar" "baz"
    and only afterwards does the C compiler do concatenation of the adjacent string literals. At this stage, one C compiler
    emits the error "concatenating mismatched wide strings"; other C compilers seem to do what I want - if any string is wide, they are all 'widened', then they are concatenated.

    What I really want is for the string-literal concatenation pass to be done first, yielding:
    TEXT("foobarbaz")
    and only afterwards, the token-pasting operator will prepend L, yielding:
    L"foobarbaz"
    At the moment, I am forcing things to happen the way I want by doing:
    #define TEXT2(a, b) TEXT(a) TEXT(b)
    #define TEXT3(a, b, c) TEXT(a) TEXT(b) TEXT(c)
    ... and so on. This works, forcing the L to be prepended first (but to each string seperately), so that all of the strings are wide by the time the compiler sees them, so that they can always be legally concatenated by any compiler.

    However I will soon reach TEXT<INT_MAX>, and it is a kluge that hurts my eyeballs each time I scroll past it.

    Apparently, under C89, the value of "foo" L"bar" is undefined, whereas later C flavours define it to be identical to L"foo" L"bar". I suppose this is why I only encounter the problem with one specific compiler. But I can't upgrade the compiler in question (because it's a closed source POS that I am forced to use and cannot change).

    Does anybody know of another way to accomplish what I want?
    Last edited by lonehacker; 08-26-2004 at 01:45 AM.

  2. #2
    and the hat of wrongness Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    32,500
    > TEXT("foo" "bar" "baz")
    So what's wrong with writing
    TEXT("foo") TEXT("bar") TEXT("baz")
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.
    I support http://www.ukip.org/ as the first necessary step to a free Europe.

  3. #3
    moi
    moi is offline
    Registered User moi's Avatar
    Join Date
    Jul 2002
    Posts
    946
    why do you have (or are approaching) INT_MAX string literals in your program code? that in itself seems quite kludgey.
    hello, internet!

  4. #4
    Registered User
    Join Date
    Aug 2004
    Posts
    7
    Sorry, I should have given more background info.
    The reason that I have such a large nuumber of such string literals is I am doing things like this (except much worse):
    #ifdef _WIN32
    #define SWITCH_CHAR "/"
    /* under Win32, this means narrow or wide depending on which function (printf/wprintf) it is passed to. This is a Microsoft extension. */
    #define T_FMT "%s"
    #else /* UNIX */
    #define SWITCH_CHAR "-"
    #ifdef UNICODE_SUPPORT
    /* under UNIX, this always means 'wide string' no matter which function it's passed to. This is ISO C99. */
    #define T_FMT "%ls"
    #else /* UNIX, !UNICODE_SUPPORT */
    /* under UNIX, this always means 'narrow string' no matter which function it's passed to. This too is ISO C99. */
    #define T_FMT "%s"
    #endif /* UNICODE_SUPPORT */
    #endif /* _WIN32/UNIX */

    #define HELP_MSG T3("Usage: ", T_FMT, " ", T_FMT, "foo: set the foo option")
    fprintf(stderr, HELP_MSG, argv[0], SWITCH_CHAR);

    Ie T_FMT is 'the format string that means the passed pointer points to a foo string', where foo is wide #ifdef UNICODE_SUPPORT, and narrow if not.

    Hope you understand what I'm trying to achieve... I want to use one format string to mean 'here comes a string', no matter whether (given the current build settings/OS/phase of the moon) these strings are wide, narrow, or whatever. But this requires different values of T_FMT depending on build settings, OS, phase of the moon, etc. So I am stuck with trying to concatenate a biggish number of seperate constant strings. (I think - am I?).

    BTW, you would think that this stuff would be standardised - how to say "this is a string" across platforms. But Microsoft had to go and add their own proprietary non-standard extensions... <sigh>

  5. #5
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,796
    >you would think that this stuff would be standardised
    That's convenient, because it is. Now we won't have confused programmers wandering around.

    >how to say "this is a string" across platforms
    "this is a string" works just fine for me and everyone else using standard C. L"this is a string" is the wide version of that, and oddly enough there are variants of the standard library that work with wide strings.

    >But Microsoft had to go and add their own proprietary non-standard extensions... <sigh>
    The nice thing about proprietary non-standard extensions is that you aren't forced to use them.

    >(I think - am I?)
    It really looks like you're trying too hard. Sit back and rethink the problem and I'm sure you'll see something a little more elegant.
    My best code is written with the delete key.

  6. #6
    Registered User
    Join Date
    Aug 2004
    Posts
    7

    Elegant solutions...

    There're only three elegant solutions I can see at the moment:

    1. Preprocess format strings in a wrapper for *printf*, converting %s to %ls or whatever as needed. But this makes every call to *printf* into a malloc, copy/munge, call *printf*, then free. Elegant because it completely hides the problem beneath an abstraction layer; but inefficient and therefore offensive to my C programmer sensibilities :-)

    2. Upgrade Visual Crud++ to a newer version that groks newer ISO C dialects, and therefore interprets L"foo" "bar" as being identical to L"foo" L"bar" instead of printing silly error messages. This is the real solution - fix the compiler limitation that's biting me.

    3. The other 'elegant' solution would be to switch to using GCC under Cygwin or Mingw32 to do the Win32 port of my program. That looks more appealing to me each day I fight^H^H^H^H^Huse Visual C++.

    <rant>

    It would be much easier to convince the boss to shell out several thousand dollars on a new version of VC++ if I could be sure that the new version does contain these newer ISO C string concatenation semantics. But do you think I can find that information anywhere on Microsoft's site? Grrr.

    The only stuff they have is all marketing-type crap ('increase developer productivity', blah blah), or technical information regarding web services, C#, managed code, and other 'fluff', which I couldn't care less about. Where is a Changelog for CL.EXE? Cut the crap about 'managed code' - where is information on the friggin' ANSI C compiler? It's the most important part since almost everything uses C eventually. In fact to me it's the only important part, since I couldn't give a rat's arse about 'Web Services', c sharp, c flat, e sharp minor, whatever.

    </rant>

    This is all rather frustrating...

    Anyway, thanks for help guys.

  7. #7
    Yes, my avatar is stolen anonytmouse's Avatar
    Join Date
    Dec 2002
    Posts
    2,544
    The Visual C++ Toolkit 2003 is a free command line version of VC.NET 2003.

    And no, L"Hello" "World" will not compile.

  8. #8
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by anonytmouse
    And no, L"Hello" "World" will not compile.
    Were you talking about under VC++? Or gcc?
    Code:
    #include <stdio.h>
    #include <wchar.h>
    int main( void )
    {
            fwprintf( stdout, L"Hello World\n" );
            return 0;
    }
    
    gcc -o widec widec.c -Wall -std=c99
    ./widec
    Hello World
    Or did you mean just the string by itself?

    Quzah.
    Hope is the first step on the road to disappointment.

  9. #9
    Code Goddess Prelude's Avatar
    Join Date
    Sep 2001
    Posts
    9,796
    >2. Upgrade Visual Crud++ to a newer version that groks newer ISO C dialects
    The latest version of Visual Studio does not parse C99.

    >and therefore interprets L"foo" "bar" as being identical to L"foo" L"bar" instead of printing silly error messages
    And what makes you think that this is the wrong behavior? Can you quote me line and verse from the standard that says L"foo" "bar" must be parsed as L"foo" L"bar"? Don't claim that a compiler does it wrong unless you can prove it, and if you can I highly suggest doing so to avoid being flamed.

    >This is all rather frustrating...
    I can empathize with you, but you could hold back the scathing insults about your tools and spend some of that frustration energy doing more productive things. Reading paragraph after paragraph of Microsoft bashing gets old after a few years.
    My best code is written with the delete key.

  10. #10
    Registered User
    Join Date
    Aug 2004
    Posts
    7
    Prelude wrote:
    And what makes you think that this is the wrong behavior? Can you quote me line and verse from the standard that says L"foo" "bar" must be parsed as L"foo" L"bar"? Don't claim that a compiler does it wrong unless you can prove it, and if you can I highly suggest doing so to avoid being flamed.
    IMO printing an error message, instead of compiling L"foo" "bar" as if it instead had have been written L"foo" L"bar", is 'wrong', in the sense that the former behaviour is more surprising and less useful to me than the latter. In this subjective sense of the word 'wrong', then as seen by me, this behaviour is wrong, yes.

    I have never claimed that this behaviour is 'wrong' in the sense of not adhering to C standards to which the product claims to adhere. You've misinterpreted me as saying something that I in fact didn't say (deliberately perhaps?).

    I do claim that this behaviour does not conform to C99. But VC++ does not claim to implement C99. VC++ does in this matter adhere to C90, and that's all its docs claim to do. So in summary:
    - VC++ is not 'wrong' in this objective sense of the word, meaning adherence to specifications as advertised; and
    - I never said that it was; and
    - I am not saying so now.

    According to some web pages I read (no links, sorry), ISO C90 leaves this behaviour unspecified (so VC++ complies with C90 as advertised when it rejects my program) but C99 mandates the behaviour I prefer - make all strings being concatenated wide, then concatenate them. Google for 'wide string concatenation iso c' or something similar, and you'll doubtless find the pages that I read that cause me to say this. If you want ISO C spec chapter and verse, I encourage you to obtain the spec and find the appropriate section(s). I don't have time to do this for you, I'm sorry.

    I apologise for offending you with my 'microsoft-bashing'. As I said, I was frustrated. That being so, I have to admit that I enjoy Microsoft bashing, and I often find it to be justified. I am not sorry that I engage in Microsoft-bashing; I *am* sorry I offended you with it. I will avoid it here in future for your benefit.

    quzah: My problem here (ok, well, one of them) is the different behaviour of GCC 3.3.4 and VC++ 7.something (or ISO C99 and C90 compilers respectively) when attempting to concatenate a wide and a narrow string, thus: L"foo" "bar". Or written more clearly: "foo" L"bar" - only one 'L', but two seperate strings in quotes.
    Your program doesn't do this - I don't see any string concatenation in there. Your program just shows that your UNIX box is capable of writing a wide string to stdout, which is a cool feature, but it's nothing to do with string concat behaviour exactly.

    anonytmouse: Thanks for the link - I'll go download that and have a play with it - even though it doesn't solve this particular problem it'll be useful in the future, I bet.

    Cheers!

  11. #11
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by lonehacker
    quzah: My problem here (ok, well, one of them) is the different behaviour of GCC 3.3.4 and VC++ 7.something (or ISO C99 and C90 compilers respectively) when attempting to concatenate a wide and a narrow string, thus: L"foo" "bar". Or written more clearly: "foo" L"bar" - only one 'L', but two seperate strings in quotes.
    Your program doesn't do this - I don't see any string concatenation in there. Your program just shows that your UNIX box is capable of writing a wide string to stdout, which is a cool feature, but it's nothing to do with string concat behaviour exactly.
    I misread their quote. I read it as "Hello World", not "Hello" "World". But if it really matters:
    Code:
    #include <stdio.h>
    #include <wchar.h>
    int main( void )
    {
            fwprintf( stdout, L"Hello" "World\n" );
            return 0;
    }
    It DOES compile, without warnings or errors, and gives the expected output:

    HelloWorld (Because after all, they left out the space, so to be "accurate", I did too.)

    So yes, it does compile, as stated.

    Quzah.
    Hope is the first step on the road to disappointment.

  12. #12
    Registered User
    Join Date
    Aug 2004
    Posts
    7
    Quzah: Great, now try getting this to compile using Visual C++, which is what I am trying to do and which is my problem.

    I think you think that I am saying that this doesn't ever compile or run anywhere. If you read more closely you'll see I am not saying this.

    Thanks for the research anyway.

  13. #13
    Sweet
    Join Date
    Aug 2002
    Location
    Tucson, Arizona
    Posts
    1,802
    nm i didn't read your post right
    Last edited by prog-bman; 09-15-2004 at 01:44 AM.
    Woop?

  14. #14
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Quote Originally Posted by lonehacker
    Quzah: Great, now try getting this to compile using Visual C++, which is what I am trying to do and which is my problem.

    I think you think that I am saying that this doesn't ever compile or run anywhere. If you read more closely you'll see I am not saying this.

    Thanks for the research anyway.
    I don't use Windows.

    Quzah.
    Hope is the first step on the road to disappointment.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. help! string literals
    By cakestler in forum C Programming
    Replies: 16
    Last Post: 02-05-2009, 10:41 AM
  2. Replies: 4
    Last Post: 03-03-2006, 01:11 AM
  3. Linked List Help
    By CJ7Mudrover in forum C Programming
    Replies: 9
    Last Post: 03-10-2004, 09:33 PM
  4. Classes inheretance problem...
    By NANO in forum C++ Programming
    Replies: 12
    Last Post: 12-09-2002, 02:23 PM
  5. Warnings, warnings, warnings?
    By spentdome in forum C Programming
    Replies: 25
    Last Post: 05-27-2002, 06:49 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21