Rewriting function into a more generic form.

**Exosomes** · 10-15-2020

Usually I don't pay much particular attention to how generic or flexible my code is. If it doesn't jump straight into sight as being something obviously reusable, I don't make it reusable. Mainly that's because I don't share the pervasive philosophy of the OOP culture of devising highly abstracted, generic constructs as much as you can. I don't care about building "flexible" code that may help solve a variety of problems I may or may not face in future; I want to solve my own current problem, now, and do it in a simple, clear and efficient manner.

Yet, because my "sight" is rather short -I have the literal proficiency of an imbecile monkey when it comes to programming- some "obvious" things often escape me.
For instance today I found myself writing this:

Code:

static const char *find_duplicate(const char *list, const char *items)
{
    const unsigned  listlen = (unsigned)strlen(list);
    unsigned        itemlen;
    const char      *start = items;
    char            *delimiter;

    if (list == items)
    {    while (*items)
        {    delimiter = strchrnul(items, ',');
            itemlen = (unsigned)(delimiter - items);
            if ((items = text_contains(delimiter, listlen - (unsigned)(delimiter - start), items, itemlen) ) )
                return (items);
            items = delimiter + !!*delimiter;
        }
    }
    else
    {    while (*items)
        {    delimiter = strchrnul(items, ',');
            itemlen = (unsigned)(delimiter - items);
            if ((items = text_contains(list, listlen, items, itemlen) ) ) //memmem wrapper with additional code to ensure exact rather than partial matches.
                return (items);
            items = delimiter + !!*delimiter;
        }
    }
    return ((void*)0);
}

Later today, I found I needed to do the opposite: find the first differing, i.e non-duplicate, item of two strings. Which led me to revisit this code, and wonder if there is a good way to 1. merge both loops into one,in order to eliminate the awfully redundant code and 2. alter the function to instead, return the first non-duplicate item.

I came up with this untested solution but I'm not thrilled about it. I wonder what cool alternatives the great minds that roam these forums may come up with.

Code:

static const char *find_item
(const char *list, const char *items, const _Bool find_duplicate)
{
    const unsigned  listlen = (unsigned)strlen(list);
    unsigned        itemlen;
    const char      *start = items;
    char            *delimiter;

    while (*items)
    {    delimiter = strchrnul(items, ',');
        itemlen = (unsigned)(delimiter - items);
        if (list == start)
            items = text_contains(delimiter, listlen - (unsigned)(delimiter - start), items, itemlen);
        else
            items = text_contains(list, listlen, items, itemlen); //memmem wrapper with additional code to ensure exact rather than partial matches.
        if ((find_duplicate && items) || (!find_duplicate && !items) )
            return (items);
        items = delimiter + !!*delimiter;
    }
    return ((void*)0);
}

**Exosomes** · 10-15-2020

Bonus question: I wonder if there is a way to simplify the ((find_duplicate && items) || (!find_duplicate && !items) ) expression.
I've played around with it and can transform it to one form or another, but found no way to reduce it further, e.g:

Code:

(a && b) || (!a && !b) = (a || !b) && (!a || b)

**hamster_nz** · 10-15-2020

You can use binary XOR to reduce your expression, but only if the values are 1s and 0s.

**hamster_nz** · 10-16-2020

Here's how I would do it, given that I do a lot of embedded code, and the standard library is aimed at null terminated strings, and assuming I understand your requirements.

Code:

static int matches
(const char *a, const char *b)
{
  while(*a == *b) {
    a++;
    b++;
    if((*a == ',' || *a == '\0') && (*b == ',' || *b == '\0'))
      return 1;
  }
  return 0;
}


static const char *next_element
(const char *a) {
   // Scan for next element seperator
   while(*a != '\0' && *a != ',')
      a++;


   // End of string?
   if(*a == '\0')
      return (void *)0;


   // Skip over a comma
   a++;


   return a;
}


static const char *find_item
(const char *list, const char *items, const _Bool find_duplicate)
{
   const char *haystack = list;
   while(haystack) {  // For each item in the list being searched
      const char *needle = items;
      while (needle) {  // Check if it or is not in the list of things we are comparing with
         if(matches(haystack,needle))
            break;
         needle = next_element(needle);
      }
      if(find_duplicate && needle != NULL)
          return haystack;  // Element in 'list' has been found in 'items'


      if(!find_duplicate && needle == NULL)
          return haystack;  // Element in 'list' was not foud in 'items'
      haystack = next_element(haystack);
   }
   return (void*)0;
}

**laserlight** · 10-16-2020

Originally Posted by hamster_nz

assuming I understand your requirements

I think Exosomes's find_duplicate function actually does two things:

Find the first duplicate token in a list of tokens as a string in a comma-separated value format
Find the first matching token between two lists of tokens each as a string in a comma-separated value format

The first behaviour is invoked when the caller passes the same string as both the first and second parameters, otherwise the second behaviour is invoked. Exosomes's find_item function extends find_duplicate with the option of negation of the original behaviours.

**Exosomes** · 10-16-2020

Laser light is correct, but I still like your solution.
I find interesting the different ways people go about solving the same problems; it's always cool to see new approaches. Perhaps I'd update needle along with haystack after a successful match, saving an unnecessary call to match (they are different, needle is one element behind).
And, props for not using library functions; although searching characters one by one is probably much slower.

**hamster_nz** · 10-16-2020

Originally Posted by Exosomes

Laser light is correct, but I still like your solution.
I find interesting the different ways people go about solving the same problems; it's always cool to see new approaches. Perhaps I'd update needle along with haystack after a successful match, saving an unnecessary call to match (they are different, needle is one element behind).
And, props for not using library functions; although searching characters one by one is probably much slower.

Opps - I got the requirements slightly off.

On the speed front it really depends on your environment, and testing would be needed to say either way. The standard library functions (e.g. strlen(), strcmp()) aren't magical.

I also have a slight worry that text_contain() is a problem. If you use it to look for "cat" or "dog" in "book,catalogue,dogma,index", does it find it?

**laserlight** · 10-16-2020

I'd say another consideration might be: just how large are these lists of tokens in production, and how long does each token tend to be? If these lists are long enough and comparing tokens expensive enough, then it might make sense to explicitly parse the source strings first into lists of token objects, then sort these lists of token objects in order to do single pass matching, at the cost of additional memory proportional to the size of the lists.

EDIT:
Oh, but that will change the behaviour though, since you wouldn't be finding the same first match/duplicate.

Thread: Rewriting function into a more generic form.

Thread Tools

Search Thread

Display

Rewriting function into a more generic form.

Similar Threads

Rewriting an old function. Need help with structure, datatypes etc

Generic function

Problem with generic function

help with generic function

Generic function for initialisation

Tags for this Thread