Thread: UTF-8 character counter in C

Threaded View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Registered User
    Join Date
    Nov 2020
    Posts
    4

    UTF-8 character counter in C

    I'm taking an introductory course in computer science and one of our tasks is to edit this file that counts the number of characters in a file so that it can also count the number of characters in a file with UTF-8 encoded characters.
    The code we were supposed to edit is:

    Code:
    #include <stdio.h>
    
    
    typedef unsigned char BYTE;
    
    
    int main(int argc, char const *argv[])
    {
        if (argc != 2)
        {
            printf("Usage: ./count INPUT\n");
            return 1;
        }
        FILE *file = fopen(argv[1], "r");
        if (!file)
        {
            printf("Could not open file.\n");
            return 1;
        }
            size_t i, count;
        while(1)
        {
            BYTE b;
    
            fread(&b, 1, 1, file);
            if (feof(file))
            {
                break;
            }
            count++;
            }
        }
        printf("Number of characters: %i\n", count);
    
    
        fclose(file);
    
    
        return 0;
    }
    I thought that simply changing

    fread(&b, 1, 1, file);

    into

    fread(&b, 4, 1, file);

    would do it since I know that utf-8 uses 4 bytes, but apparently not, because the program crashes. I've tried looking for other ways, but the methods they use are quite confusing as well (or perhaps it my lack of experience). I've also tried other ways such as assigning the file's contents into char variable instead of byte, but the code ends up looking more complicated and that's probably wrong anyway.
    Also, I've tried changing

    typedef unsigned char BYTE;

    into

    unsigned char BYTE[4] = {0};

    So that it would be able to hold 4 bytes, but it still doesn't work. Can you give me tips or at least point me in the right direction at least? Thank you.
    Last edited by meowmeow004; 11-22-2020 at 09:53 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 10
    Last Post: 07-05-2011, 08:21 PM
  2. frequency character counter
    By hasanah in forum C Programming
    Replies: 4
    Last Post: 04-15-2009, 01:28 AM
  3. Page File counter and Private Bytes Counter
    By George2 in forum Tech Board
    Replies: 0
    Last Post: 01-31-2008, 03:17 AM
  4. comparing character in a string to anothr character
    By merike in forum C Programming
    Replies: 5
    Last Post: 05-11-2007, 12:16 AM
  5. wide character (unicode) and multi-byte character
    By George2 in forum Windows Programming
    Replies: 6
    Last Post: 05-05-2007, 12:46 AM

Tags for this Thread