Thread: creating an implementation of ngram

  1. #1
    Registered User
    Join Date
    Oct 2022

    creating an implementation of ngram

    In computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. When the items are words, n-grams may also be called shingles.
    and thus far this is what i've been able to write
    "coppied from my IDE"
    #include <stdio.h>
    #include <string.h>
    #include <unistd.h>
    #include <stdlib.h>
    // write a method to sort the string
    char *sort_str(char *str)
        char *nstr;
        nstr = (char *)malloc(strlen(str) - 1);
        nstr = strdup(str);
        int strln = strlen(str);
        char temp;
        int i, j;
        for (i = 0; i < strln; i++)
            for (j = 0; j < strln; j++)
                if (nstr[i] < nstr[j])
                    temp = nstr[i];
                    nstr[i] = nstr[j];
                    nstr[j] = temp;
        return nstr;
    int main(int argc, char **argv)
        if (argc == 1)
        int i = 1;
        int j = 1;
        int totallen = 0;
        while (1)
            if (argv[i] == NULL) break;
            totallen = totallen + strlen(argv[i]);
        char *word = malloc(totallen);
        while (1)
            if (argv[j] == NULL) break;
            strcat(word, argv[j]);
        char *new_str = sort_str(word);
        int count = 0;
        char *checked;
        checked = (char *)malloc(strlen(new_str));
        int strln = strlen(new_str);
        for (int i = 0; i < strln; i++)
            for (int j = 0; j < strln; j++)
                if ((!strchr(checked, new_str[i])) && (new_str[i] == new_str[j]))
            if (!(strchr(checked, new_str[i])))
                printf("%c:%d\n", new_str[i], count);
                count = 0;
            checked[i] = new_str[i];
        return 0;

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    The edge of the known universe
    > nstr = (char *)malloc(strlen(str) - 1);
    1. There's no need to cast malloc in a C program (FAQ).
    2. It should be strlen(str)+1 to allocate sufficient space.
    3. It's redundant (and a memory leak) anyway, since strdup both duplicates the string, and overwrites the pointer.

    Regarding the sorting, a bubble sort is fine for playing with an idea and you only have <50 chars say.
    But for something more involved, you'd need qsort, qsort_s -

    > while (1)
    The idiomatic way of iterating through argv is
    for ( int i = 1 ; i < argc ; i++ )
    > checked = (char *)malloc(strlen(new_str));
    I'm not sure what you're trying to achieve here, but the memory isn't initialised.
    So the following calls involving strchr(checked are broken.
    Perhaps use calloc, so that your memory is a real empty string you can add one char at a time to.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. GMP RSA implementation
    By andydufresne in forum C++ Programming
    Replies: 3
    Last Post: 10-11-2012, 10:59 AM
  2. Using C++ within C implementation
    By onako in forum C Programming
    Replies: 16
    Last Post: 10-31-2011, 02:12 PM
  3. Replies: 7
    Last Post: 10-01-2008, 07:45 PM
  4. C and implementation
    By Troll_King in forum A Brief History of
    Replies: 2
    Last Post: 09-04-2002, 08:00 AM
  5. implementation?
    By calQlaterb in forum C++ Programming
    Replies: 1
    Last Post: 12-11-2001, 10:25 AM

Tags for this Thread