I am puzzled by char*

zach · 08-20-2019

Is char* in the C language the same as "string" in other languages?
If not, could you please give me *code* examples of using char? Zach K.

**Asymptotic** · 08-20-2019

Originally Posted by zach

Is char* in the C language the same as "string" in other languages?
If not, could you please give me *code* examples of using char? Zach K.

The answer is yes - mostly but not 100% the same as strings in other languages.

You see, the C programming language constructs are generally less abstracted/high level than other languages such as Java, C#, or Python.

But first I want you to ask yourself this question - What is a string? How would we define it? I would define it like so:

"A contiguous sequence of valid characters from some set such as ascii or Utf-8 for example."

We know we need at least several components:

1.) The location in memory that the string starts at
2.) The actual characters in the string
3.) The end of the string

Without these pieces of info, we could not have a string in any language... The computer needs to know where the string begins, ends, and what its contents are at the very least.

Now let's take a look at what a char * is and what it does.

A char * is simply a character pointer. A character pointer is a space in memory which holds another memory address of a char. Let's look at an example:

Code:

char *name = "Joshua";

What I want you to understand here is that the variable "name" is a char* which holds the memory address of J. Why? Because J is the first letter. The method by which C tracks the first letter of the string (aka char array) is to simply store that in the pointer to the string. The pointer points at the beginning of the string.

Now, #2 above is pretty straightforward - you can clearly see that the actual characters making up the word are being stored in memory there. What is less clear is, how C sees the end of the string.

In the above example, there are actually SEVEN characters, even though Joshua is 6 characters long. Why? Because the C compiler has added a final character implicitly/silently for us '\0'. That \0 is just as important to understand as the char * part. '\0' tells C - specifically the string functions which are in string.h this is the end of the string, so don't continue reading memory past it." Without \0, C would have no way to tell the computer when to stop reading memory and thus something like printf would print a bunch of other garbage and potentially go into memory that it shouldn't, crash the program etc...

Next, let's address the location of the characters. Where are the letters Joshua actually stored? We know that name is a pointer which contains the address of J, but where is J o s h u a \0?

In this example, the name is being stored in read-only memory. This means that when you declare a string like this -

Code:

char *name = "Joshua";

then that will always point to an immutable string in read-only memory.

However, there are other ways to declare strings in C. Remember that a string is just a sequence of contiguous valid characters, right? How else may we be able to create a string? By using an array of stack or heap memory!

Code:

char name[] = "Joshua";
char name2[7] = {'J','o','s','h','u','a','\0'};
char name3[7];

memcpy(name3, "Joshua\0", 7);

All of those methods above accomplish the task of storing the character string "Joshua" into a stack array. If we were to pass "name" to a function or &name, this would again refer to the address of the J on the stack.

Finally, let's look at strings stored in the heap memory:

Code:

char *name = malloc(7*sizeof(char));
memcpy(name, "Joshua\0", 7);
free(name);

Here, we ask the memory allocator for 7 bytes of memory... We say 7 * sizeof(char) because:

1.) If char is for some reason not 1 byte on a given system, the code won't break
2.) Mostly for consistency's sake - although it isn't likely that the size of char will ever change from 1 byte, it's more likely that you will try to allocate space for 50 ints and write malloc(50) rather than malloc(50*sizeof(int)), which will cause memory corruption

Next, we again use memcpy to copy the chars from read-only memory into the heap memory that we allocated. Like earlier in this post, at the point where we call memcpy, "name" is a pointer which points to the beginning of the allocated memory that we got from malloc. This memory will then be populated by memcpy. memcpy knows to stop at the end of Joshua\0 because we've told memcpy that we want to copy 7 bytes here.

Finally, for this demo's sake, I called free(name) because we must always free heap memory after we are done using it.

All of the above can be considered strings in C It may seem confusing but at the end of the day the only thing that has potential to change is where the string is being stored. The concept of the string remains the same - some contiguous length of characters in memory with a '\0' at the end (called a NULL TERMINATOR) and whos start address we've stored (in the char *).

Note that strings in other languages may not function exactly this way even if they are semantically the same thing. For example, a string could be tracked solely by a integer holding its length and a start address, rather than by a NULL TERMINATOR. So, say we wanted to create our own string type:

Code:

struct our_string
{
   size_t length;
   char *start;
};

Using that simple struct, we could implement our own string because the algorithms working on this data could be programmed to start at a given address and automatically stop reading the characters once the length is reached, instead of looking for a NULL terminator.

So semantically, the string would be the same, but the implementation/internals of how the string algorithms operate on the data would be different. This is the type of thing people mean when they say that C strings aren't necessarily the same as strings in other languages. A simplified example, but still.

zach · 08-20-2019

Asymptotic, this is a very generous answer to my query that you wrote. I am very grateful. I will try and digest it. Zach K.

zach · 08-20-2019

Asymptotic, what you have written is very clear and professional and fills in many a gap in my understanding. I do however still have a question about

Code:

char* foo (char[a], char [b])
{
}

I cannot conceptualize what this char* is; I have no idea what happens when I write the code below (in an attempt to copy cat code I saw elsewhere).
How are parameters being passed; how are the results of the function being passed back; how are the passed beck results integrated in what happens in main()?

The following code doesn't error out but I cannot make to work

Code:

#include <stdio.h>
#include<strings.h>
#include<windows.h>

void makeFrame(void);

char* securityCheck(char pin[]);
    
int main(void)
{
    char pin[5];
    char result[5];
    securityCheck(pin);
    printf("\n\nThis is your pin: %s\n\n", pin);
    getchar();
    return 0;
}
char* securityCheck(char pin[])
{
    char value[5];
    locate (2,5);
    scanf("Please enter your pin %4s",value);
    strcpy(pin,value);
    return pin;
}

**Asymptotic** · 08-20-2019

One more thing I wanted to clarify is that when you use the char array format like this:

Code:

char name2[] = {'t','e','s','t'};

This is not a string because there's a gotcha in C here where if you declare the chars individually like this, it does not automatically add a \0 at the end. Therefore, there is no NULL TERMINATOR here and C won't know where to stop reading the chars.

So when we use the format above to declare a string, we must always add a null-terminator:

Code:

char name2[] = {'t','e','s','t','\0'};

Here is an example program where I do not add a null-terminator and its output:

Code:

#include <stdio.h
#include <stdlib.h>
int main(void)
{
    char name[] = "Joshua";
    char name2[] = {'t','e','s','t'};

    printf("%s\n", name2);
    return EXIT_SUCCESS;

}

For me, when I compile the program with gcc-8 using this:

Code:

gcc-8 c_strings.c -g -o c_strings -Wall -Wextra

I get out:

Code:

testJoshua

So in this case, quite literally, C had the computer just keep reading until it finally hit a \0 and coincidentally, that \0 was at the end of Joshua. So C read "testJoshua". It should have read "test" and stopped, if we programmed it correctly and put the \0 in at the end of name2.

Note that this particular result is circumstantial to this particular program, configuration, compiler version, and operating system. Reading a string without a null terminator can produce undefined behavior. In this case, that behavior was defined as reading the next string on the stack. But in another program and/or using a different compiler, something else could happen. Generally, these are undesired results and should be avoided at all costs.

**christop** · 08-20-2019

Originally Posted by Asymptotic

Here, we ask the memory allocator for 7 bytes of memory... We say 7 * sizeof(char) because:

1.) If char is for some reason not 1 byte on a given system, the code won't break
2.) Mostly for consistency's sake - although it isn't likely that the size of char will ever change from 1 byte, it's more likely that you will try to allocate space for 50 ints and write malloc(50) rather than malloc(50*sizeof(int)), which will cause memory corruption

Reason 2 is a good enough reason to use "sizeof (char)", but the standards actually guarantee/mandate that "sizeof (char) == 1".

But my preferred way to write that code is like this:

Code:

char *name = malloc(7*sizeof *name);

This code has less redundancy, which means you can change the type of name in only one place and it'll still work (imagine if you changed only the first "char" to "int" in your code--you'd allocate too little memory to hold 7 ints!).

**Asymptotic** · 08-21-2019

christop you're right, that is a bad habit I'm trying to break. It's not technically "bad" per se, but I agree with you that it's better to limit the places a programmer must change should the type change. But I got stuck in the habit of sizeof(type) rather than sizeof(object).

Originally Posted by christop

Reason 2 is a good enough reason to use "sizeof (char)", but the standards actually guarantee/mandate that "sizeof (char) == 1".

But my preferred way to write that code is like this:

Code:

char *name = malloc(7*sizeof *name);

This code has less redundancy, which means you can change the type of name in only one place and it'll still work (imagine if you changed only the first "char" to "int" in your code--you'd allocate too little memory to hold 7 ints!).

Thread: I am puzzled by char*

Thread Tools

Search Thread

Display

I am puzzled by char*

Similar Threads

Puzzled.

Puzzled

Puzzled

Puzzled

Still puzzled

Tags for this Thread