Skipws

**George2** · 03-03-2008

Hello everyone,

skipws works for both char based and wchar_t based string stream? I have not found formal clarification from MSDN.

http://msdn2.microsoft.com/en-us/library/98bsd5x4.aspx

thanks in advance,
George

**CornedBee** · 03-03-2008

Yes.

**George2** · 03-03-2008

Thanks CornedBee,

A further question, L"\n" or L"\r" or L"\t" has the same meaning and function in UNICODE compared with ANSI peers "\n" or "\r" or "\t"? Or we should use other terms to represent "\n", "\r" and "\t" meaning of ANSI in UNICODE?

Originally Posted by CornedBee

Yes.

regards,
George

**CornedBee** · 03-03-2008

Don't mistake narrow strings for ANSI or wide strings for UNICODE. First, both terms are stupid Microsoftisms with no little connection to proper terminology. Second, nothing in the C++ standard says what encodings the strings have to be in.

But '\n' and L'\n' and the other pairs are indeed supposed to have the same semantic meaning when printed.

**George2** · 03-03-2008

Thanks CornedBee,

1.

Originally Posted by CornedBee

Don't mistake narrow strings for ANSI or wide strings for UNICODE. First, both terms are stupid Microsoftisms with no little connection to proper terminology.

The two terms you mean ANSI/UNICODE or narrow strings/wide strings? Why do you think stupid?

2.

Originally Posted by CornedBee

Second, nothing in the C++ standard says what encodings the strings have to be in.

You mean what "\n" and L"\n" identifies is not C++ Spec?

regards,
George

**matsp** · 03-03-2008

Originally Posted by George2

Thanks CornedBee,
[snip]

You mean what "\n" and L"\n" identifies is not C++ Spec?

No, they most certainly are, but the definition is "a newline in the correct format for that platform" in either narrow and wide character form respectively. And they are not "identifiers", by the way.

--
Mats

**CornedBee** · 03-03-2008

\n is a newline. But the encoding of this newline is implementation-defined.

ANSI/UNICODE are the stupid terms. ANSI is the American National Standards Institute. Win32 came to misuse the term to mean "an encoding specified by ANSI", like ASCII, but this term is extremely misleading. The default narrow character set on US or Western European Windows installations is Windows-1252, an adaption of ISO-8859-1, but standardized by no one. There are various other Windows-* codepages, all called ANSI, but very few, if any, are standardized. Even Microsoft says it's stupid. I quote Wikipedia:

Microsoft has stated that "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community"

As for UNICODE, the term is derived from the macro name that switches the Win32 API generic names between the multibyte and wide character variants (A and W suffixes). The real Unicode document and consortium are not spelled all-uppercase. Also, Unicode is a character set and a set of algorithms for properly handling international character data. It also defines a number of encodings for this character set, the most important of which are UTF-8, UTF-16 and UTF-32. What Windows programmers refer to as UNICODE is really UTF-16, or (in Windows NT) even the crippled UCS-2.

**George2** · 03-04-2008

Hi CornedBee,

I am confused what ANSI means. Previously I think it means the current code page of the current locale, and it has different value for different locale, for example, ANSI in western and ANSI in Japan are two different code page. i.e. ANSI means a specific codepage on a specific locale platform.

But in your words, "There are various other Windows-* codepages, all called ANSI", seems ANSI means all of the codepages?

Could you help to clarify please?

Originally Posted by CornedBee

\n is a newline. But the encoding of this newline is implementation-defined.

ANSI/UNICODE are the stupid terms. ANSI is the American National Standards Institute. Win32 came to misuse the term to mean "an encoding specified by ANSI", like ASCII, but this term is extremely misleading. The default narrow character set on US or Western European Windows installations is Windows-1252, an adaption of ISO-8859-1, but standardized by no one. There are various other Windows-* codepages, all called ANSI, but very few, if any, are standardized. Even Microsoft says it's stupid. I quote Wikipedia:

As for UNICODE, the term is derived from the macro name that switches the Win32 API generic names between the multibyte and wide character variants (A and W suffixes). The real Unicode document and consortium are not spelled all-uppercase. Also, Unicode is a character set and a set of algorithms for properly handling international character data. It also defines a number of encodings for this character set, the most important of which are UTF-8, UTF-16 and UTF-32. What Windows programmers refer to as UNICODE is really UTF-16, or (in Windows NT) even the crippled UCS-2.

regards,
George

**CornedBee** · 03-05-2008

ANSI can refer to the code pages or the mode where Windows uses the code pages. Does it matter that much?

**George2** · 03-05-2008

Thanks CornedBee,

1.

I have made some self-study.

http://en.wikipedia.org/wiki/Code_page

Looks like ANSI code page means a set of code pages, and not a specific code page.

2.

ANSI code pages are all multi-byte encoding? Other than wide character?

Originally Posted by CornedBee

ANSI can refer to the code pages or the mode where Windows uses the code pages. Does it matter that much?

regards,
George

**CornedBee** · 03-05-2008

2) Single-byte or multi-byte, but not wide.

**George2** · 03-05-2008

Thanks CornedBeem,

1.

Good to learn that ANSI only includes single-byte and multi-byte, not including wide character;

2.

Previously you mentioned ANSI is a stupid term? Or both ANSI and UNICODE are stupid terms? Why?

Originally Posted by CornedBee

2) Single-byte or multi-byte, but not wide.

regards,
George

**matsp** · 03-05-2008

Originally Posted by George2

2.

Previously you mentioned ANSI is a stupid term? Or both ANSI and UNICODE are stupid terms? Why?

regards,
George

I think CornedBee already answered that one. But I'll have a go at doing it differently:
Neither ANSI nor UNICODE exactly describes the term. ANSI actually means "one of a number of different variants of 8-bit character sets that are based on ASCII but extended to 8 bits".

UNICODE refers to a standard that supports several different formats, including 8-bit version(s), 16-bit version(s) and 32-bit versions. So again, it's not a precise definition of what is being used.

--
Mats

**George2** · 03-05-2008

Thanks Mats,

Can I understand in this way?

1. UNICODE is character/value mapping, each specific character only has one specific value in UNICODE table;

2. Codepage is how a character/UNICODE value is represented and encoded in memory/storage/..., different code page will (may) represent the same character in different encoding values.

Both are correct?

Originally Posted by matsp

I think CornedBee already answered that one. But I'll have a go at doing it differently:
Neither ANSI nor UNICODE exactly describes the term. ANSI actually means "one of a number of different variants of 8-bit character sets that are based on ASCII but extended to 8 bits".

UNICODE refers to a standard that supports several different formats, including 8-bit version(s), 16-bit version(s) and 32-bit versions. So again, it's not a precise definition of what is being used.

--
Mats

regards,
George

Thread: Skipws

Thread Tools

Search Thread

Display

Skipws

Similar Threads

Problems with skipws