Skipws

This is a discussion on Skipws within the C++ Programming forums, part of the General Programming Boards category; Hello everyone, skipws works for both char based and wchar_t based string stream? I have not found formal clarification from ...

  1. #1
    Registered User
    Join Date
    May 2006
    Posts
    1,579

    Skipws

    Hello everyone,


    skipws works for both char based and wchar_t based string stream? I have not found formal clarification from MSDN.

    http://msdn2.microsoft.com/en-us/library/98bsd5x4.aspx


    thanks in advance,
    George

  2. #2
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,893
    Yes.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  3. #3
    Registered User
    Join Date
    May 2006
    Posts
    1,579
    Thanks CornedBee,


    A further question, L"\n" or L"\r" or L"\t" has the same meaning and function in UNICODE compared with ANSI peers "\n" or "\r" or "\t"? Or we should use other terms to represent "\n", "\r" and "\t" meaning of ANSI in UNICODE?

    Quote Originally Posted by CornedBee View Post
    Yes.

    regards,
    George

  4. #4
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,893
    Don't mistake narrow strings for ANSI or wide strings for UNICODE. First, both terms are stupid Microsoftisms with no little connection to proper terminology. Second, nothing in the C++ standard says what encodings the strings have to be in.

    But '\n' and L'\n' and the other pairs are indeed supposed to have the same semantic meaning when printed.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  5. #5
    Registered User
    Join Date
    May 2006
    Posts
    1,579
    Thanks CornedBee,


    1.

    Quote Originally Posted by CornedBee View Post
    Don't mistake narrow strings for ANSI or wide strings for UNICODE. First, both terms are stupid Microsoftisms with no little connection to proper terminology.
    The two terms you mean ANSI/UNICODE or narrow strings/wide strings? Why do you think stupid?

    2.

    Quote Originally Posted by CornedBee View Post
    Second, nothing in the C++ standard says what encodings the strings have to be in.
    You mean what "\n" and L"\n" identifies is not C++ Spec?


    regards,
    George

  6. #6
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by George2 View Post
    Thanks CornedBee,
    [snip]

    You mean what "\n" and L"\n" identifies is not C++ Spec?
    No, they most certainly are, but the definition is "a newline in the correct format for that platform" in either narrow and wide character form respectively. And they are not "identifiers", by the way.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  7. #7
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,893
    \n is a newline. But the encoding of this newline is implementation-defined.

    ANSI/UNICODE are the stupid terms. ANSI is the American National Standards Institute. Win32 came to misuse the term to mean "an encoding specified by ANSI", like ASCII, but this term is extremely misleading. The default narrow character set on US or Western European Windows installations is Windows-1252, an adaption of ISO-8859-1, but standardized by no one. There are various other Windows-* codepages, all called ANSI, but very few, if any, are standardized. Even Microsoft says it's stupid. I quote Wikipedia:
    Microsoft has stated that "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community"
    As for UNICODE, the term is derived from the macro name that switches the Win32 API generic names between the multibyte and wide character variants (A and W suffixes). The real Unicode document and consortium are not spelled all-uppercase. Also, Unicode is a character set and a set of algorithms for properly handling international character data. It also defines a number of encodings for this character set, the most important of which are UTF-8, UTF-16 and UTF-32. What Windows programmers refer to as UNICODE is really UTF-16, or (in Windows NT) even the crippled UCS-2.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  8. #8
    Registered User
    Join Date
    May 2006
    Posts
    1,579
    Hi CornedBee,


    I am confused what ANSI means. Previously I think it means the current code page of the current locale, and it has different value for different locale, for example, ANSI in western and ANSI in Japan are two different code page. i.e. ANSI means a specific codepage on a specific locale platform.

    But in your words, "There are various other Windows-* codepages, all called ANSI", seems ANSI means all of the codepages?

    Could you help to clarify please?

    Quote Originally Posted by CornedBee View Post
    \n is a newline. But the encoding of this newline is implementation-defined.

    ANSI/UNICODE are the stupid terms. ANSI is the American National Standards Institute. Win32 came to misuse the term to mean "an encoding specified by ANSI", like ASCII, but this term is extremely misleading. The default narrow character set on US or Western European Windows installations is Windows-1252, an adaption of ISO-8859-1, but standardized by no one. There are various other Windows-* codepages, all called ANSI, but very few, if any, are standardized. Even Microsoft says it's stupid. I quote Wikipedia:


    As for UNICODE, the term is derived from the macro name that switches the Win32 API generic names between the multibyte and wide character variants (A and W suffixes). The real Unicode document and consortium are not spelled all-uppercase. Also, Unicode is a character set and a set of algorithms for properly handling international character data. It also defines a number of encodings for this character set, the most important of which are UTF-8, UTF-16 and UTF-32. What Windows programmers refer to as UNICODE is really UTF-16, or (in Windows NT) even the crippled UCS-2.

    regards,
    George

  9. #9
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,893
    ANSI can refer to the code pages or the mode where Windows uses the code pages. Does it matter that much?
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  10. #10
    Registered User
    Join Date
    May 2006
    Posts
    1,579
    Thanks CornedBee,


    1.

    I have made some self-study.

    http://en.wikipedia.org/wiki/Code_page

    Looks like ANSI code page means a set of code pages, and not a specific code page.

    2.

    ANSI code pages are all multi-byte encoding? Other than wide character?

    Quote Originally Posted by CornedBee View Post
    ANSI can refer to the code pages or the mode where Windows uses the code pages. Does it matter that much?

    regards,
    George

  11. #11
    Cat without Hat CornedBee's Avatar
    Join Date
    Apr 2003
    Posts
    8,893
    2) Single-byte or multi-byte, but not wide.
    All the buzzt!
    CornedBee

    "There is not now, nor has there ever been, nor will there ever be, any programming language in which it is the least bit difficult to write bad code."
    - Flon's Law

  12. #12
    Registered User
    Join Date
    May 2006
    Posts
    1,579
    Thanks CornedBeem,


    1.

    Good to learn that ANSI only includes single-byte and multi-byte, not including wide character;

    2.

    Previously you mentioned ANSI is a stupid term? Or both ANSI and UNICODE are stupid terms? Why?

    Quote Originally Posted by CornedBee View Post
    2) Single-byte or multi-byte, but not wide.

    regards,
    George

  13. #13
    Kernel hacker
    Join Date
    Jul 2007
    Location
    Farncombe, Surrey, England
    Posts
    15,677
    Quote Originally Posted by George2 View Post
    2.

    Previously you mentioned ANSI is a stupid term? Or both ANSI and UNICODE are stupid terms? Why?


    regards,
    George
    I think CornedBee already answered that one. But I'll have a go at doing it differently:
    Neither ANSI nor UNICODE exactly describes the term. ANSI actually means "one of a number of different variants of 8-bit character sets that are based on ASCII but extended to 8 bits".

    UNICODE refers to a standard that supports several different formats, including 8-bit version(s), 16-bit version(s) and 32-bit versions. So again, it's not a precise definition of what is being used.

    --
    Mats
    Compilers can produce warnings - make the compiler programmers happy: Use them!
    Please don't PM me for help - and no, I don't do help over instant messengers.

  14. #14
    Registered User
    Join Date
    May 2006
    Posts
    1,579
    Thanks Mats,


    Can I understand in this way?

    1. UNICODE is character/value mapping, each specific character only has one specific value in UNICODE table;

    2. Codepage is how a character/UNICODE value is represented and encoded in memory/storage/..., different code page will (may) represent the same character in different encoding values.

    Both are correct?

    Quote Originally Posted by matsp View Post
    I think CornedBee already answered that one. But I'll have a go at doing it differently:
    Neither ANSI nor UNICODE exactly describes the term. ANSI actually means "one of a number of different variants of 8-bit character sets that are based on ASCII but extended to 8 bits".

    UNICODE refers to a standard that supports several different formats, including 8-bit version(s), 16-bit version(s) and 32-bit versions. So again, it's not a precise definition of what is being used.

    --
    Mats

    regards,
    George

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Problems with skipws
    By maneesh in forum C++ Programming
    Replies: 2
    Last Post: 11-22-2005, 03:10 PM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21