# Thread: Python --> C

1. ## Python --> C

This is my first attempt at C, although I have Python experience.

I wrote a Python program but it is too slow and I think C might help me. How do I convert it into C?

The problem:

I have a bunch of triple numbers, like so

Code:
`[[0, 1, 3], [4, 5, 7], [6, 7, 2], [5, 4, 1], [0, 2, 7], [4, 6, 3], [1, 4, 3], [5, 0, 7], [5, 1, 0], [6, 2, 3], [4, 7, 6], [0, 3, 2]]`
I want to find those pairs of triples that share 2-components. For example. Triple2 and triple9 intersect because they share n=2 components: (6,7,2) and (6,2,3) share 6 and 2

Now, in Python, this is a solution:

Code:
```def condition_shared(prim1,prim2,n):
return len(set(prim1).intersection(set(prim2))) == n

def searchlist(myList):
r = range(len(myList))
result = []
for i in r:
for j in r:
if condition_shared(myList[i],myList[j],2):
result.append((i,j))
return tuple(result)

print searchlist(lst)```
In Python terms: It returns a tuple with indices of those pairs that have n-components in common. So we find that it contains the pair (2, 9), meaning lst[2] and lst[9] because they share n=2 components:

How would I go about writing this in C? (Could it even be multithreaded?)

2. "All" you have to do is define a type that is a triple of numbers, and things called "intersection", "len", "set", "range", "condition_shared", "append" and "tuple".

3. You could start by creating a struct holding you tripples, then make an array of those.

Code:
```typedef struct{
int a;
int b;
int c;
}Tripple;

// then

Tripple trippleList[n];```
As for the question if it could be multithreaded, if I understand what you are trying to do correctly you won't gain anything from dividing your list into sublists and do them in parallel since you have to compare those sublists to each other as well in the end.

4. Originally Posted by tabstop
"All" you have to do is define a type that is a triple of numbers, and things called "intersection", "len", "set", "range", "condition_shared", "append" and "tuple".
Is that another way of saying "Beginner, hahaha"

5. No, as someone who does lots of both, if you are coming to C from Python you are going to HATE C. Particularly for doing stuff like this. Not trying to discourage you but the "batteries included" aspect of Python is "roll your own batteries" in C.

6. Originally Posted by Macha
Is that another way of saying "Beginner, hahaha"
No, that's another way of saying "in C you have to roll your own". I would guess you've got at least a couple hundred lines of C code needed just to get what's in Python out of the box in terms of the data structures.

On the other hand C++ would have a lot more of what you're expecting to use ready to go.

7. Originally Posted by Macha
This is my first attempt at C, although I have Python experience.

I wrote a Python program but it is too slow and I think C might help me. How do I convert it into C?
The verdict of what tabstop said is that you should use C++. Since it is as fast (the say) as C. In the case of implementing stuff with no experience, it is faster than C++, cause you won't make them as good as the standards data types or good ones found in libraries. So google a bit, you might find what you want ready to go in C++.

You can explain what the functions do in Python and maybe we can give you a function in C/C++.

8. This is an interesting problem. Sounds very doable. What is the range of the numbers in the sub sets?

I have never tried "rolling my own batteries" but it sounds like something not to be missed. < Thanks, Jeff! >

I'm quite sure it can be sped up, but what are the parameters of this - how many sub sets do you typically have, and what is your run-time for this part of the Python program?

If the current run-time is 1/100th of a second, it's probably not worth optimizing in C, to get it to run in 1/500th of a second.

While you're at it, would you put a 5 second run-time input file on Swoopshare, so I could have something to compare a C program run-time, to? Post the url to it, back here, of course, so it can be d/l'ed. That would give us a nice target to beat.

Not that anyone here is competitive, at all. Never happen.

9. Thanks for the replies.

OK, I see. Maybe it is a bit too difficult to do it in C for me!

The triplets consist of integers only and I may have several thousand, even hundreds of thousands of them. Each triplet intersects with an average of 3 other triplets. No triplets contain double or triple identical numbers (so (8,8,1) would never occur)

The components of triplets themselves are unordered but the whole list of triplets is ordered (I need to keep track of their indices).

Currently Python takes under a second for a thousand triplets, but after that it becomes a waiting game. Way waaaay to long. That's why I am looking at other languages for this.

It is daunting to do this "from atoms". Perhaps C++ is better? I thought it was harder, but I don't know either.

Unfortunately I haven't got a big dataset with me on this computer. If you really want to and can wait a few days I can upload one. But a small one is below. The above Python code takes 0.016 seconds for this.

Code:
```[(0, 1, 5), (1, 2, 6), (2, 3, 7), (4, 5, 9), (5, 6, 10), (6, 7, 11), (8, 9, 13), (9, 10, 14),
(10, 11, 15), (16, 17, 21), (17, 18, 22), (18, 19, 23), (20, 21, 25), (21, 22, 26),
(22, 23, 27), (24, 25, 29), (25, 26, 30), (26, 27, 31), (28, 29, 33), (29, 30, 34),
(30, 31, 35), (32, 33, 37), (33, 34, 38), (34, 35, 39), (36, 37, 14), (37, 38, 13),
(38, 39, 12), (18, 41, 40), (17, 42, 41), (17, 16, 43), (40, 41, 45), (41, 42, 46),
(42, 43, 47), (44, 45, 1), (45, 46, 2), (46, 47, 3), (40, 48, 23), (44, 49, 48), (0, 4, 49),
(48, 50, 27), (49, 51, 50), (4, 8, 51), (50, 35, 31), (51, 39, 35), (8, 12, 39), (47, 52, 7),
(43, 53, 52), (16, 20, 53), (52, 54, 11), (53, 55, 54), (20, 24, 55), (54, 36, 15),
(55, 32, 36), (24, 28, 32), (55, 24, 32), (54, 55, 36), (11, 54, 15), (53, 20, 55),
(52, 53, 54), (7, 52, 11), (43, 16, 53), (47, 43, 52), (3, 47, 7), (51, 8, 39), (50, 51, 35),
(27, 50, 31), (49, 4, 51), (48, 49, 50), (23, 48, 27), (44, 0, 49), (40, 44, 48),
(19, 40, 23), (46, 3, 2), (45, 2, 1), (44, 1, 0), (42, 47, 46), (41, 46, 45), (40, 45, 44),
(17, 43, 42), (18, 17, 41), (19, 18, 40), (38, 12, 13), (37, 13, 14), (36, 14, 15),
(34, 39, 38), (33, 38, 37), (32, 37, 36), (30, 35, 34), (29, 34, 33), (28, 33, 32),
(26, 31, 30), (25, 30, 29), (24, 29, 28), (22, 27, 26), (21, 26, 25), (20, 25, 24),
(18, 23, 22), (17, 22, 21), (16, 21, 20), (10, 15, 14), (9, 14, 13), (8, 13, 12),
(6, 11, 10), (5, 10, 9), (4, 9, 8), (2, 7, 6), (1, 6, 5), (0, 5, 4)]```

10. With C++, I'm not sure what could be done, but my intuition is that this can be done quite quickly, in plain C.

I will wait for your bigger example data file, and use this one to experiment with, in the meantime. Below 16/1000's of a second, optimization is hard to find because the the elapsed time is likely to be reported as just all zero's. The bigger file will fix that problem.

Thanks for bringing this problem up.

11. I copy/pasted the data sample in post #9 10 times (1080 samples in all). I don't know if that's still a valid data set, but it seemed good enough to test with whilst waiting for the upload.

It would be interesting if others came up with the same number of results. It could be fast, but wrong.

Give or take a bit of noise, this takes 85 milliseconds (and this is inside a VM)
Code:
```\$ ./a.out
Elapsed time:From=886376, To=971654, Elapsed=85278
Num tuples=1080, num results=21060
0 74: 0 1 5 -> 44 1 0
0 106: 0 1 5 -> 1 6 5
0 107: 0 1 5 -> 0 5 4
0 108: 0 1 5 -> 0 1 5
0 182: 0 1 5 -> 44 1 0
0 214: 0 1 5 -> 1 6 5
0 215: 0 1 5 -> 0 5 4
0 216: 0 1 5 -> 0 1 5
0 290: 0 1 5 -> 44 1 0
0 322: 0 1 5 -> 1 6 5
\$ ./a.out
Elapsed time:From=335430, To=418509, Elapsed=83079
Num tuples=1080, num results=21060
0 74: 0 1 5 -> 44 1 0
0 106: 0 1 5 -> 1 6 5
0 107: 0 1 5 -> 0 5 4
0 108: 0 1 5 -> 0 1 5
0 182: 0 1 5 -> 44 1 0
0 214: 0 1 5 -> 1 6 5
0 215: 0 1 5 -> 0 5 4
0 216: 0 1 5 -> 0 1 5
0 290: 0 1 5 -> 44 1 0
0 322: 0 1 5 -> 1 6 5
\$ ./a.out
Elapsed time:From=518591, To=605428, Elapsed=86837
Num tuples=1080, num results=21060
0 74: 0 1 5 -> 44 1 0
0 106: 0 1 5 -> 1 6 5
0 107: 0 1 5 -> 0 5 4
0 108: 0 1 5 -> 0 1 5
0 182: 0 1 5 -> 44 1 0
0 214: 0 1 5 -> 1 6 5
0 215: 0 1 5 -> 0 5 4
0 216: 0 1 5 -> 0 1 5
0 290: 0 1 5 -> 44 1 0
0 322: 0 1 5 -> 1 6 5```
With -O2 optimisation, it's 11 milliseconds

> Not that anyone here is competitive, at all. Never happen.
Game On!

12. Well, there are certainly more fancy ways to do this in C++ (such as substituting the array-to-vector conversion code with append-to-vector-with-proxy magic), but I figured it'd probably be best to stick with K.I.S.S., for clarity's sake:

Code:
```#include <iostream>
#include <vector>
#include <iterator>

/*
WARNING: This macro will *only* work on LOCAL variables
*/
#define ARRAY_SIZE( array ) ( sizeof( array ) / sizeof( array[ 0 ] ) )

using namespace std;

/*
Quick and dirty print kludge
*/
namespace std
{
template < typename Type >
ostream& operator << ( ostream& out, vector< Type > const& values )
{
out << "{ ";
for( size_t i = 0, size = values.size( ); i < size; ++i )
{
if( i != 0 )
out << ", ";
out << values[ i ];
}
out << " }";
}
}

typedef vector< int >
tuple_int;
typedef vector< tuple_int >
tuple_tuple_int;

tuple_int triplet( int first, int second, int third )
{
int
array[ ] = { first, second, third };
return tuple_int( array, array + ARRAY_SIZE( array ) );
}

bool condition_shared( tuple_int const& lhs, tuple_int const& rhs, size_t size )
{
tuple_int
tmp;
set_intersection( lhs.begin( ), lhs.end( ), rhs.begin( ), rhs.end( ), back_inserter( tmp ) );
return tmp.size( ) == size;
}

tuple_tuple_int searchlist( tuple_tuple_int const& list )
{
tuple_tuple_int
result;
size_t
size = list.size( );
for( size_t i = 0; i < size; ++i )
{
for( size_t j = 0; j < size; ++j )
{
/*
Sanity check
*/
if( i == j )
continue;
if( condition_shared( list[ i ], list[ j ], 2 ) )
{
int array[ ] =
{
i,
j
};
result.push_back( tuple_int( array, array + ARRAY_SIZE( array ) ) );
}
}
}
return result;
}

int main( void )
{
tuple_int
array[ ] =
{
triplet( 0, 1, 3 ),
triplet( 4, 5, 7 ),
triplet( 6, 7, 2 ),
triplet( 5, 4, 1 ),
triplet( 0, 2, 7 ),
triplet( 4, 6, 3 ),
triplet( 1, 4, 3 ),
triplet( 5, 0, 7 ),
triplet( 5, 1, 0 ),
triplet( 6, 2, 3 ),
triplet( 4, 7, 6 ),
triplet( 0, 3, 2 )
};
tuple_tuple_int
list( array, array + ARRAY_SIZE( array ) );
cout << " - Values - " << endl;
copy( list.begin( ), list.end( ), ostream_iterator< tuple_int >( cout, "\n" ) );
tuple_tuple_int
indices = searchlist( list );
cout << " - Indices - " << endl;
copy( indices.begin( ), indices.end( ), ostream_iterator< tuple_int >( cout, "\n" ) );
}```
And of course, if the performance still isn't good enough, you could always work to get rid of code that generates temporaries and such (by rewriting it "by hand", that is), but my guess is that probably isn't really necessary. Then again...

Originally Posted by Adak
Not that anyone here is competitive, at all. Never happen.
<starts rewriting code in assembly language>

13. Originally Posted by jeffcobb
No, as someone who does lots of both, if you are coming to C from Python you are going to HATE C. Particularly for doing stuff like this. Not trying to discourage you but the "batteries included" aspect of Python is "roll your own batteries" in C.
I came to C from Python and I LOVED rolling my sleeves up and getting my hands dirty. It was like discovering the joys of driving a stick shift for the first time.

14. Originally Posted by Sharke
I came to C from Python and I LOVED rolling my sleeves up and getting my hands dirty. It was like discovering the joys of driving a stick shift for the first time.
Yeah, I was gonna make a comment about that. I came from perl.

There are clearly people that don't like the "minimal" and apparently "feature poor" syntax of C. I think the best way to avoid ending up as such a person is to not approach a problem like the OP's in terms of translating the code by duplicating functions (although it may end up much like that anyway). I have seem someone do this previously and I think it was a bad way to learn C, all they could think about was how to create a C version of feature X, dismissing the possibility there might be a completely different way of approaching the problem which makes better use of existing features (of C).

So that means learning the basic syntax, then considering the code you want to "translate" in terms of what it does, not how it does it. Then forget the python code and try and implement in C from scratch. Like I said, this may end up the same way as just duplicating python functions, but it also may end up very different. I regularly re-write stuff I've already done in perl in C, and sometimes use perl to prototype an idea because it is much faster to code in. But there are often little "short cuts" in the lower level world that are not apparent from the higher one (eg, the significance of memory addressing) and it's there that the magic happens.

A major factor in this is that it is awkward and "unnatural" to work from a byte literal perspective in high level, dynamically typed language like perl or python. Whereas in C, it's totally the foundation of everything.* What seems to be a consequence of a lack of high level features is a feature/opportunity itself.

* the inverse of this is that regular expressions are very fundamental in the interpreted languages whereas they seem awkward or even pointless in C.

15. Originally Posted by Sharke
I came to C from Python and I LOVED rolling my sleeves up and getting my hands dirty. It was like discovering the joys of driving a stick shift for the first time.
<drift>
Ironically enough, I still drive a stick and when I bought a new car for the new job last month or so, I searched and searched until I could find a new car with a stick. Getting harder every year. But I do know what you mean; my "first" language was whatever monstrosity of BASIC was on the Ti/99A back in the day but when I quickly hit the limits of the language (and memory of the machine of the day) I went to assembly and got the same feeling you describe.

<sorta back on subject>
By "rolling your own batteries" I was playing off of the Python mantra of "batteries included" because 99% of the time if you need a class/function to do something, nearly anything, it is already in the language or more precisely in one of the libraries. With C anything not accomplished by one of the keywords or stdlibs, you have to make it yourself. And remember: the one of the points of C is to do stuff almost as efficiently as possible; with Python it is to get stuff done/get to a working model as quickly as possible. They serve different purposes. Who knows, the OP may discover he loves getting his hands dirty, so to speak for the thrill of eeking out some extra clock cycles. if so, he has chosen the right path to it (leaving out assy which really isn't the point of this forum anyhow).

C++ may be a good choice for him b/c some of the things he is doing with data structures is already done in the STL so he may find that to be a better middle-ground between pain and pleasure...

Popular pages Recent additions