Python --> C

**MK27** · 05-24-2010

Okay, I feel slightly guilty, here's what I was suggesting earlier, but it is maybe not much simpler than Sebastiani's idea. On the other hand, it involves more basic low level syntax and will be faster (and methinks, way more memory efficient), so despite my alleged "lack of familiarity of the language and the development processes used by C++ programmers", I'll stand by what I said before WRT appropriateness for the OP:

Code:

#include <stdio.h>
#include <stdlib.h>

#define SETSIZE 100
#define ROWALLOC 16

void populate (int **array) {
	FILE *data = fopen("data.txt","r");
	char buf[16];
	int i = 0;

	while (fgets(buf,16,data)) {
		sscanf(buf,"%d %d %d",&array[i][0],&array[i][1],&array[i][2]);
		i++;
	}

	fclose(data);
}

int ***searchlist (int **triples, int size) {
	int len = ROWALLOC, ***r = malloc(len*sizeof(int**)), found = 0, match,
		i, j, k, l, n;

	for (n = 0; n<ROWALLOC; n++) r[n] = malloc(2*sizeof(int*));

	for (i=0;i<size-1;i++) {
		for (j=i+1;j<size;j++) {
			match = 0;
			for (k=0;k<3;k++) {
				for (l=0;l<3;l++) 
					if (triples[i][k] == triples[j][l]) match++;
			}
			if (match > 1) {
				if (len == found) {
					r = realloc(r,(len+ROWALLOC)*sizeof(int**));
					for (n = len; n<len+ROWALLOC; n++) 
						r[n] = malloc(2*sizeof(int*));
					len += ROWALLOC;
				}
				r[found][0] = triples[i];
				r[found][1] = triples[j];
				found++;
			}
		}
	}    

	if (found<len) 
		for (n = found; n<len; n++) {
			free(r[n]);
			r[n] = NULL;
		}
	else {
		r = realloc(r, (len+1)*sizeof(int**));
		r[len] = NULL;
	}

	return r;
}


int main(int argc, const char *argv[]) {
	int *triples[SETSIZE], i, ***matches;
	for (i=0;i<SETSIZE;i++) triples[i] = malloc(3*sizeof(int));

	populate(triples);
	matches = searchlist(triples, SETSIZE);

	i = 0;
	while (matches[i]) {
		printf("(%d %d %d) ",matches[i][0][0],matches[i][0][1],matches[i][0][2]);
		printf("(%d %d %d)\n",matches[i][1][0],matches[i][1][1],matches[i][1][2]);
		free(matches[i]);
		i++;
	}
	free(matches);

	for (i=0;i<SETSIZE;i++) free(triples[i]);

	return 0;
}

One function, one input list, one output list. It's an "exploded" version of the python code. I ran this on a file of one hundred lines like this:

1 -2 -9
-3 3 5
0 -3 5
7 1 -7
9 2 -3

Triples of numbers between -9 and 9 where the triples do not contain duplicate values (eg, 6 6 6 or 1 -2 1). The value of the numbers could have been anything that fits into a int, but then you will have to adjust the buffer size in populate().

ROWALLOC is the amount to grow the return array by in a step, since doing this one row at a time is inefficient. 16 is a pretty good number since it will work out to either 128+64 or 256+128 bytes at a time on most systems. The last bit of searchlist() is to null terminate the array. Another option would be to pass in an int pointer that could be set to the number of pairs.

I usually avoid ***three_star_variables but in this case it seems appropriate. I have to run soon, but if Macha is interested and has any questions I'll be back later.

**MK27** · 05-24-2010

Just noticed if(match) should not have been inside the k loop since that will result in duplicated pairs, but that's now corrected. So example output:

(-4 -9 6) (9 -9 6)
(-4 -9 6) (6 -3 -4)
(-4 -9 6) (-4 6 -8)
(-4 -9 6) (6 8 -9)
(-4 -9 6) (-9 -4 -1)
(-9 -5 -3) (-5 -9 9)
(-9 -5 -3) (-3 -5 -6)
(-9 -5 -3) (-5 4 -3)
(-9 -5 -3) (-9 0 -3)
(6 0 -7) (3 0 6)
(6 0 -7) (6 -7 2)
(9 -5 -1) (-5 -9 9)
(9 -5 -1) (-5 7 -1)
(9 -5 -1) (-9 9 -1)

**Macha** · 05-24-2010

Wow, I don't understand all of it (yet) but it looks interesting how you can optimize things. Still not quite sure how to add values to an array yet. (I have to resize it somehow?)

Ultimately I would like to call this code from Python. So I wonder what the best way of interfacing this would be. There are two ways I could feed this function from Python:

-with a list of tuples like this: [ (1,2,3), (4,5,6) ]
- with individual tuples like this (1,2,3) ...

And I guess there are two ways of returning values from C back to Python:

-A long array of triplets
-Each triplet separately

What would be fastest, and how would I go about doing it?

**jeffcobb** · 05-24-2010

You had me at fast...I have a python XML-RPC server that I access via C++ but its not fast, just functional...

**Macha** · 05-24-2010

Originally Posted by Salem

Are 1,3 and 3,1 regarded as separate matches (does symmetry count?)

If there are only a few such cases it doesn't matter. It just means that further down the line in my Python code a computation would be done twice, but with the same result.

> int compare(struct trip t0, struct trip t1)
Not so good - you're passing two structures by value, which is a lot of extra copying of data (compared to the amount of work done).
Passing two pointers to structures would be better.

That means int compare(struct trip *t0, struct trip *t1), right?

Oh, and watch the void main - not good.

Yes, I noticed it gives me an error, but runs nevertheless. I got rid of it.

**Salem** · 05-24-2010

> That means int compare(struct trip *t0, struct trip *t1), right?
Correct!

> If there are only a few such cases it doesn't matter. It just means that further down the line in my Python code a computation would be done twice
But it also takes longer for the C code to produce the results to begin with.
A simple code change saves both doing unnecessary work.

**MK27** · 05-25-2010

It might have been better to use an array of those triple structs in my code rather than ints, you could reduce a level of indirection. Indirection is about pointing. So a pointer to an int in place of an int is a layer of indirection. It involves more memory and more processing. Sometimes this is worth it, because anything else would be impossible, or else even more ops (processing) would be involved in maintaining (eg) sequential data. Also, without indirection you must copy entire sequences up the "memory stack" when you make function calls (very expensive), rather than having one set of data on the "memory heap" and referring to it with a pointer. This is the purpose of malloc(). (heap vs. stack memory is an important general concept in C/C++)

Originally Posted by Macha

Wow, I don't understand all of it (yet) but it looks interesting how you can optimize things. Still not quite sure how to add values to an array yet. (I have to resize it somehow?)

In C (unlike python or C++), you have no built-in dynamic arrays. Dynamic refers to the idea that the size of the array in memory will automatically grow to accept data. This is not as simple as it seems, because you cannot just have infinitely open ended regions for every single array in a program. Fortunately, you do not have to arrange the memory in C, that is taken care of by the compiler. But you do need to manage the amount that you need for a particular task. That's what the use of realloc() is about:

Code:

				if (len == found) {
					r = realloc(r,(len+ROWALLOC)*sizeof(int**));
					for (n = len; n<len+ROWALLOC; n++) 
						r[n] = malloc(2*sizeof(int*));
					len += ROWALLOC;
				}

Realloc() just changes the total memory allocated to a pointer (either up or down -- I could have used it to shrink the array at the end, just leaving one NULL, but most of it is deallocated anyway with free()). There are a few reasons not to do this too frequently. One is that it is processor intensive (I believe one malloc() requires hundreds of lines of assembly to implement). The other is that if there is not enough contiguous memory available where the data is currently stored, realloc() may move the whole thing somewhere else. That's pretty silly for say, one byte or one row.

In fact, looking at the output, I get 244 unique pairs of triplets from my (randomly generated) list of 100. So that realloc/malloc loop was called 15 times. If I made ROWALLOC 32, it would only be 7 times, and I'd at most waste 256 bytes, which is nothing (and those could be unwasted with a final realloc). If you were dealing with a list of a thousand triplets, you'd probably want ROWALLOC to be at least 100 or 200.

The way it specifically works is this: "r" (the array of pairs) is actually an array of pointers to pointers. So r is a pointer to a pointer to a pointer (***r) and each element of the array it points to is a pointer to a pointer (**). This is because each element is a "row" in a 2D table containing two pointers:

Code:

(Arrows are actual pointers)
r -> 0 -> 1st triplet
          -> 2nd triplet
  -> 1  -> 1st triplet
          -> 2nd triplet
...etc

This could be optimized by just making r an array of pointers, and remembering that this is a list of pairs (I think that is how the python code works), for whatever reason I thought that would be less clear. Anyway, the important difference is then that each element of r must be malloced enough memory to contain a pair of pointers:

Code:

					for (n = len; n<len+ROWALLOC; n++) 
						r[n] = malloc(2*sizeof(int*));

These pointers are then set to elements of the triples array. Notice that "triples" was constructed in a similar way:

Code:

	int *triples[SETSIZE], i, ***matches;
	for (i=0;i<SETSIZE;i++) triples[i] = malloc(3*sizeof(int));

except here each row is three actual int values. There are no real ints in "r" -- each row member is a pointer to a row in triples. That saves copying and storing three numbers, so it's more efficient.

You do need to understand this to use C++ too, although you don't need to use realloc() to manually dynamize an array, because the STL provides "vectors" and other dynamic datatypes. However, the logic behind ROWALLOC still applies, because it is still better to ask for a bunch of elements at once rather than one at a time if you are not sure where the vector/array will go, but it will keep growning and growing.

In fact, you can probably do that with python, you can do it in perl. It's somewhat less of a issue there tho (the interpreted languages) since I believe the default approach is "power of 2" growth. You have one element, you ask for another you have two. When you ask for a third, you actually get 4. When you get to 5, you get 8, when you get to 9 you get 16 and so on.

An issue with the STL is that it is not as efficient as a hand coded version, partially due to the generic use of "templates" which can be any data type. So if you were working on a hard core number crunching app, you might want to use C++ for ease but with a lot of actual C style low level coding (ie, not use vectors or the STL for your primary storage), you kind of have the best of both worlds there. Potentially. If you are not comfortable with basic C stuff (or worse: unaware), you won't see how to take advantage of it.

All apologies if some of that is confusing. The way I remember it, you kind of have to understand memory segmentation to understand pointers, but it may be hard to understand memory segmentation if you don't understand pointers

I think most people just start out with a shadowy idea of both and then actual practice slowly sheds light.

**KBriggs** · 05-25-2010

Question about realloc - when an array is realloced, are the contents of the array that are not included in the resizing preserved? Or is that only necessarily the case if the array did not need to be moved in memory?

**tabstop** · 05-25-2010

Originally Posted by KBriggs

Question about realloc - when an array is realloced, are the contents of the array that are not included in the resizing preserved? Or is that only necessarily the case if the array did not need to be moved in memory?

All your original data is still where you think it is (i.e., if array[40] was 15 before the realloc, then array[40] is 15 afterwards (assuming that [40] is in-bounds both times).

**Salem** · 05-25-2010

Everything is preserved, up to min(oldsize,newsize), regardless of whether it moves in memory or not.

Also compare how I called realloc with how MK27 called realloc (which is a bug).
If realloc fails, you still have the old memory (or you would have, if you haven't trashed your only pointer to it in calling realloc).

**MK27** · 05-25-2010

Originally Posted by Salem

Also compare how I called realloc with how MK27 called realloc (which is a bug).
If realloc fails, you still have the old memory (or you would have, if you haven't trashed your only pointer to it in calling realloc).

Yeah, that's worth noting. Generally I don't consider it "a bug" since I code pretty much exclusively on linux assuming default kernel settings*, and only if I see some chance that the code will get used elsewhere would I bother to check the return value of malloc/realloc. Meaning I should have done that here I guess

A "best practice" with regard to realloc is the same as it is for malloc, just you need to pay attention to the first argument:

Code:

int ***tmp, **t2;
[...]
if (len == found) {
/* it is "r" we want to reallocate, so: */
        tmp = realloc(r,(len+ROWALLOC)*sizeof(int**));
        if (!tmp) die("OUT OF MEMORY!");
        r = tmp;
	for (n = len; n<len+ROWALLOC; n++) {
              t2 = malloc(2*sizeof(int*));
              if (!t2) die("OUT OF MEMORY");
              r[n] = t2;
        }
	len += ROWALLOC;
}

Die() would be a function or macro to kill the process; since this is essential to the program functioning, and clearly the OS is in trouble now, it might as well exit and free all the memory it used. However, by using the tmp handle you do retain the option of dealing with this less drastically, by preserving your handle to the data in "r" (eg, you could save it and then exit).

* by default linux will always allocate the memory for you even if there is none available, so error handling malloc is pointless.

**Adak** · 05-25-2010

This was a lot of fun - and frustrating. I went round and round with all kinds of idea's for algorithms for this, especially involving multiple indexed sorted lists, etc.

After I finally tried a brute force method, (my DSL was out for the last few days), I scrapped all the rest. Simple is very beautiful, in this case.

Code:

/* testing a fast way to find all partial (2 of the 3 numbers) matches, 
within a large number of tuples.

Because of the great locality of data, no sorting/searching/indexing, etc.,
were useful. Tried a few idea's! ;)

Adak May 24, 2010

status: untested. 3042 matches found on Macha's 4.9sec.txt data file.

my time: 0.05 seconds at 2.66Ghz intel E6700 cpu, for the 
computational part of the program.

*/
#include <stdio.h>
#include <time.h>

#define R 2400
#define C 3

int main(void) {
  int i, j, r, c, maxrows, a[R][C] = {{0}};
  int found, match;
  clock_t start, stop;
  FILE *fp;
  printf("\n\n\n");

  //start=clock(); Overall program time
  if((fp=fopen("tuples3", "rt")) == NULL) {
    printf("Error opening file - exiting");
    return 1;
  }
  //printf("\n This is the raw data from the file:");
  c=r=0;
  do {
    i= fscanf(fp, "%d %d %d ", &a[r][c],&a[r][c+1],&a[r][c+2]);
    if(i > 2) {
      /*shows the raw data, if needed*/
      //printf("\n%4d: %4d %4d %4d", r, a[r][c],a[r][c+1],a[r][c+2]);
      ++r;
    }
  }while((i > 2) && (r<R)); 
  maxrows=r;
  printf("\n There are %d Rows of tuples", maxrows);

  //loaded up, ready to rock:
  start=clock();
  //like selection sort :) 
  for(i=0,found=0;i<maxrows-1;i++) {
    for(j=i+1;j<maxrows;j++) {
      match=0;
      if(a[i][0]==a[j][0] || a[i][0]==a[j][1] || a[i][0]==a[j][2]) {match++;}
      if(a[i][1]==a[j][0] || a[i][1]==a[j][1] || a[i][1]==a[j][2]) {match++;}
      if(a[i][2]==a[j][0] || a[i][2]==a[j][1] || a[i][2]==a[j][2]) {match++;}
      if(match>1) found++;
    }
  }
  printf("\n\n Partial matches found: %d", found);
  stop=clock();
  printf("\n\n elapsed time: %f", (stop-start)/CLK_TCK);
  printf("\n\n\t\t\t    press enter when ready");
  i=getchar();

  fclose(fp);
  return 0;
}

You may need to replace CLK_TCK with the macro for your own compiler.

It appears the data is able to fit into the cache, and that really shortens the run time.

**Sebastiani** · 05-25-2010

Originally Posted by MK27

Okay, I feel slightly guilty, here's what I was suggesting earlier, but it is maybe not much simpler than Sebastiani's idea. On the other hand, it involves more basic low level syntax and will be faster (and methinks, way more memory efficient), so despite my alleged "lack of familiarity of the language and the development processes used by C++ programmers", I'll stand by what I said before WRT appropriateness for the OP:

I guess my point is just that the performance difference isn't always going to be as much as you might think. You can write low-level C++ code, too, and moreover you can also get tons more "mileage" from it (as it can be designed to be reusable). Like I said, as a C++ programmer, you generally design from the top-down, to begin with, using high-level objects at first. Then, where you see performance problems you can simply start whittling down the code (preferably keeping the high-level interface, but working out the low-level details beneath that layer). And yes, I do believe that it can take some time to "grok" these methodologies - hell, it probably took me at least 5 years to become proficient at it.

I've put together an example to demonstrate my point a little clearer. Here is the "problem": we have to design a function to convert an array of characters to lowercase (signature: void (*)(char*)). So let's see how much faster it is in "straight C", compared with a fairly generic C++ implemention. Ready? Steady. Go!

Code:

/*
    The problem: 
    Design a function that converts a null-terminated array of 
    bytes to lowercase, using the standard 'tolower' function
*/
#include <ctype.h> // for 'tolower'

/*
    Our implementation in C - simple and fast
*/
void c_tolower_string( char* data )
{
    for( char* seq = data; *seq; ++seq )
        *seq = tolower( *seq );    
}

/*
    This class transforms a null-terminated sequence, and can be applied to 
    any data-type that conforms with the minimum interface - an 'iterator' 
    typedef, and a non-const 'begin' member function that returns an iterator
*/
template < typename Transform >
struct null_terminated_text_processor
{    
    inline null_terminated_text_processor( Transform transform = Transform( ) )
    : transform( transform )
    {    }
    
    template < typename Container >
    inline void operator ( ) ( Container& data )
    {
        for( typename Container::iterator seq = data.begin( ); *seq; ++seq )
            transform( *seq );
    }
    
    Transform
        transform;    
};

/*
    Helper generator
*/
template < typename Transform >
inline null_terminated_text_processor< Transform > make_null_terminated_text_processor(  Transform transform )
{
    return null_terminated_text_processor< Transform >( transform );
}

/*
    Minimum interface required to interact with our generic code
*/
template < typename Iterator >
struct minimal_interface_container
{
    typedef Iterator
        iterator;

    inline minimal_interface_container( Iterator seq )
    : seq( seq )
    {    }    
    
    inline Iterator begin( void )
    {
        return seq;
    }        
    
    Iterator
        seq;
};

/*
    Our lowercase converter
*/
struct lowerizer
{
    template < typename Char >
    inline void operator ( ) ( Char& ch )
    {
        ch = tolower( ch );
    }
};

/*
    Just as an example
*/
struct vowel_lowerizer
{
    template < typename Char >
    inline void operator ( ) ( Char& ch )
    {
        Char
            c = tolower( ch );            
        if( c == 'a' || c == 'e' || c == 'i' || c == 'o' || c == 'u' )
            ch = c;
    }
};

/*
    Our implementation in C++ - convoluted and slow?
*/
inline void cpp_tolower_string( char* data )
{
    minimal_interface_container< char* >
        stuff = data;
    make_null_terminated_text_processor( lowerizer( ) )( stuff );
}

/*
    Our test code
*/
#include <iostream>
#include <string>

using namespace std;

template < typename Container, typename Process >
void test( char* pid, Process process, size_t iterations, char* saved, char* data )
{
    char
        * svp, 
        * dtp;
    clock_t 
        start, 
        stop;
    cout << "Process: " << pid << endl;
    start = clock( );
    for( size_t i = 0; i < iterations; ++i )
    {
    /*
        We could simply use memcpy here, but I 
        don't want it to mess with the results
    */
        for( svp = saved, dtp = data; *svp;  )
            *dtp++ = *svp++;
        Container
            stuff = data;
        process( stuff );        
    }
    stop = clock( );
    double
        elapsed = double( stop - start ) / CLK_TCK;
    cout << "Elapsed: " << elapsed << " seconds" << endl;
}

int main( void )
{
    size_t const
        size = 50000, 
        iterations = 1000;
    char
        saved[ size ],
        data[ size ];
    srand( time( 0 ) );
    for( size_t i = 0; i < size; ++i )
        saved[ i ] = rand( ) % 26 + ( rand( ) & 1 ) * ( 'a' - 'A' ) + 'A';
    saved[ size - 1 ] = 0;
    cout << " -- Test -- " << endl;
    cout << "Total number of bytes to process: " << ( size * iterations ) << endl;
    test< char* >( "c_tolower_string", c_tolower_string, iterations, saved, data );
    test< char* >( "cpp_tolower_string", cpp_tolower_string, iterations, saved, data );
    test< minimal_interface_container< char* > >
    ( 
        "lowerizer ( minimal_interface_container< char* > )", 
        make_null_terminated_text_processor( lowerizer( ) ), 
        iterations, 
        saved, 
        data 
    );
    test< minimal_interface_container< char* > >
    ( 
        "vowel_lowerizer ( minimal_interface_container< char* > )", 
        make_null_terminated_text_processor( vowel_lowerizer( ) ), 
        iterations, 
        saved, 
        data 
    );
    test< string >
    ( 
        "lowerizer ( minimal_interface_container< string  > )", 
        make_null_terminated_text_processor( lowerizer( ) ), 
        iterations, 
        saved, 
        data 
    );
}

My output:

-- Test --
Total number of bytes to process: 50000000
Process: c_tolower_string
Elapsed: 1.981 seconds
Process: cpp_tolower_string
Elapsed: 2.149 seconds
Process: lowerizer ( minimal_interface_container< char* > )
Elapsed: 2.224 seconds
Process: vowel_lowerizer ( minimal_interface_container< char* > )
Elapsed: 2.574 seconds
Process: lowerizer ( minimal_interface_container< string > )
Elapsed: 3.386 seconds

So you see? Even though the C++ code is considerably more complex, it processes 50 million bytes of data just a fraction of a second slower. Moreover, the C++ code ultimately made our life easier, because now we can snap it together with a variety of data-structures and functionality. Delegation at it's best, IMO.

**MK27** · 05-25-2010

You're a much more experienced programmer than I am, we know that

Yer code is pretty interesting for little me to contemplate. The use of operator overloading/iterators in C++ definitely opens up some possibilities.

Originally Posted by Sebastiani

I guess my point is just that the performance difference isn't always going to be as much as you might think.

True. I'll admit the performance differences in question are not significant in my life, just at this point I am very conscious of considering how and why they occur.

You can write low-level C++ code, too, and moreover you can also get tons more "mileage" from it (as it can be designed to be reusable).

This largely depends on how you want to define "reusable". You would have to define it kind of narrowly to make this meaningful (that somehow C++ code has better mileage). Part of what you mean here is literally reusable functions, and not the slightly more abstract "reusable concepts". As in, I know how to do it, I know how to "cut, paste, n' tweak" my own code, etc. I think both of those definitions have a value, and it's not hard to understand in the context of your own work how/when to apply one or the other. I used to (repeatedly) write "generic" getline() type functions until I realized I barely ever re-used them because it's just too easy to write another (specific) one, and that trying to "genericize" to include all specific possibilities in that (specific) case is really a quixotic quest. Just write the damn function, it will take you 5 minutes. Me also thinks your tolower() scheme falls into this category, but I appreciate the value of it as a theoretical demonstration -- in the sense that no one would do this in this (specific) case, but they might in some other one.

Like I said, as a C++ programmer, you generally design from the top-down, to begin with, using high-level objects at first.

Again, I appreciate what you are saying, altho I don't see how this is somehow C++ specific. I did start "from the top down". The difference is, I am not considering some greater context within which searchlist() (or tolower()) plays a role. I don't need this function at all, currently. If I did, I might see a bigger context for it and take it into account, probably breaking it up into parts such that they are usable in similar tasks (as you've done with vowel_lowerizer). My sense is that this is more than the OP wants to think about now too. If I am wrong about that, s/he is already aware of those issues and can deal with it or ask related questions. There is no substitute for experience, but not everyone's experience is of the same kind, either

Vis, tolower() being one of a number of possible text processing tasks from which you may want to extract common elements in a class based interface, that's great, but you do need to spend some time doing text processing before you consider this approach.

Context, context, context.

**Adak** · 05-25-2010

You took a single statement for loop of code in C, and made it into a page and a half of a C++ program, which has no better run time and a cost of 10 X more time to code it, and greater complexity as well,

I don't see the value.

What makes you think that the one line for loop in C, is not re-usable? And if you did re-use these two programs with slight alterations to each one, which one would be easier to change and debug?

Thread: Python --> C

Thread Tools

Search Thread

Display

Similar Threads

Anyone else feel a certain disgust towards Python?

python c api

Python Embedding Help...

python in c++ application

Python