creating an implementation of ngram

**dailysflash** · 11-07-2022

In computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. When the items are words, n-grams may also be called shingles.
and thus far this is what i've been able to write
"coppied from my IDE"

Code:

//Ngram

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <stdlib.h>

// write a method to sort the string
char *sort_str(char *str)
{
    char *nstr;
    nstr = (char *)malloc(strlen(str) - 1);
    nstr = strdup(str);
    int strln = strlen(str);
    char temp;

    int i, j;

    for (i = 0; i < strln; i++)
    {
        for (j = 0; j < strln; j++)
        {
            if (nstr[i] < nstr[j])
            {
                temp = nstr[i];
                nstr[i] = nstr[j];
                nstr[j] = temp;
            }
        }
    }

    return nstr;
}

int main(int argc, char **argv)
{
    if (argc == 1)
    {
    }
    int i = 1;
    int j = 1;
    int totallen = 0;

    while (1)
    {
        if (argv[i] == NULL) break;
        totallen = totallen + strlen(argv[i]);
        i++;
    }

    char *word = malloc(totallen);
    while (1)
    {
        if (argv[j] == NULL) break;
        strcat(word, argv[j]);
        j++;
    }

    char *new_str = sort_str(word);

    int count = 0;
    char *checked;
    checked = (char *)malloc(strlen(new_str));
    int strln = strlen(new_str);
    for (int i = 0; i < strln; i++)
    {
        for (int j = 0; j < strln; j++)
        {
            if ((!strchr(checked, new_str[i])) && (new_str[i] == new_str[j]))
                count++;
        }

        if (!(strchr(checked, new_str[i])))
        {
            printf("%c:%d\n", new_str[i], count);

            count = 0;
        }
        checked[i] = new_str[i];
    }

    return 0;
}

**Salem** · 11-07-2022

> nstr = (char *)malloc(strlen(str) - 1);
1. There's no need to cast malloc in a C program (FAQ).
2. It should be strlen(str)+1 to allocate sufficient space.
3. It's redundant (and a memory leak) anyway, since strdup both duplicates the string, and overwrites the pointer.

Regarding the sorting, a bubble sort is fine for playing with an idea and you only have <50 chars say.
But for something more involved, you'd need qsort, qsort_s - cppreference.com

> while (1)
The idiomatic way of iterating through argv is

Code:

for ( int i = 1 ; i < argc ; i++ )

> checked = (char *)malloc(strlen(new_str));
I'm not sure what you're trying to achieve here, but the memory isn't initialised.
So the following calls involving strchr(checked are broken.
Perhaps use calloc, so that your memory is a real empty string you can add one char at a time to.

Thread: creating an implementation of ngram

Thread Tools

Search Thread

Display

creating an implementation of ngram

Similar Threads

GMP RSA implementation

Using C++ within C implementation

implementation our own malloc implementation with out using predefined malloc routine

C and implementation

implementation?

Tags for this Thread