I'm taking an introductory course in computer science and one of our tasks is to edit this file that counts the number of characters in a file so that it can also count the number of characters in a file with UTF-8 encoded characters.
The code we were supposed to edit is:
Code:
#include <stdio.h>
typedef unsigned char BYTE;
int main(int argc, char const *argv[])
{
if (argc != 2)
{
printf("Usage: ./count INPUT\n");
return 1;
}
FILE *file = fopen(argv[1], "r");
if (!file)
{
printf("Could not open file.\n");
return 1;
}
size_t i, count;
while(1)
{
BYTE b;
fread(&b, 1, 1, file);
if (feof(file))
{
break;
}
count++;
}
}
printf("Number of characters: %i\n", count);
fclose(file);
return 0;
}
I thought that simply changing
fread(&b, 1, 1, file);
into
fread(&b, 4, 1, file);
would do it since I know that utf-8 uses 4 bytes, but apparently not, because the program crashes. I've tried looking for other ways, but the methods they use are quite confusing as well (or perhaps it my lack of experience). I've also tried other ways such as assigning the file's contents into char variable instead of byte, but the code ends up looking more complicated and that's probably wrong anyway.
Also, I've tried changing
typedef unsigned char BYTE;
into
unsigned char BYTE[4] = {0};
So that it would be able to hold 4 bytes, but it still doesn't work. Can you give me tips or at least point me in the right direction at least? Thank you.