Thread: Python --> C

  1. #1
    Registered User
    Join Date
    May 2010
    Posts
    24

    Python --> C

    This is my first attempt at C, although I have Python experience.

    I wrote a Python program but it is too slow and I think C might help me. How do I convert it into C?

    The problem:

    I have a bunch of triple numbers, like so

    Code:
    [[0, 1, 3], [4, 5, 7], [6, 7, 2], [5, 4, 1], [0, 2, 7], [4, 6, 3], [1, 4, 3], [5, 0, 7], [5, 1, 0], [6, 2, 3], [4, 7, 6], [0, 3, 2]]
    I want to find those pairs of triples that share 2-components. For example. Triple2 and triple9 intersect because they share n=2 components: (6,7,2) and (6,2,3) share 6 and 2

    Now, in Python, this is a solution:

    Code:
    def condition_shared(prim1,prim2,n):
        return len(set(prim1).intersection(set(prim2))) == n
    
    def searchlist(myList):
        r = range(len(myList))
        result = []
        for i in r:
            for j in r:
                if condition_shared(myList[i],myList[j],2):
                    result.append((i,j))
        return tuple(result)
    
    print searchlist(lst)
    In Python terms: It returns a tuple with indices of those pairs that have n-components in common. So we find that it contains the pair (2, 9), meaning lst[2] and lst[9] because they share n=2 components:


    How would I go about writing this in C? (Could it even be multithreaded?)

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    "All" you have to do is define a type that is a triple of numbers, and things called "intersection", "len", "set", "range", "condition_shared", "append" and "tuple".

  3. #3
    Registered User
    Join Date
    Jan 2009
    Posts
    1,485
    You could start by creating a struct holding you tripples, then make an array of those.

    Code:
    typedef struct{
            int a;
            int b;
            int c;
    }Tripple;
    
    // then
    
    Tripple trippleList[n];
    As for the question if it could be multithreaded, if I understand what you are trying to do correctly you won't gain anything from dividing your list into sublists and do them in parallel since you have to compare those sublists to each other as well in the end.
    Last edited by Subsonics; 05-21-2010 at 07:47 PM.

  4. #4
    Registered User
    Join Date
    May 2010
    Posts
    24
    Quote Originally Posted by tabstop View Post
    "All" you have to do is define a type that is a triple of numbers, and things called "intersection", "len", "set", "range", "condition_shared", "append" and "tuple".
    Is that another way of saying "Beginner, hahaha"

  5. #5
    Registered User jeffcobb's Avatar
    Join Date
    Dec 2009
    Location
    Henderson, NV
    Posts
    875
    No, as someone who does lots of both, if you are coming to C from Python you are going to HATE C. Particularly for doing stuff like this. Not trying to discourage you but the "batteries included" aspect of Python is "roll your own batteries" in C.
    C/C++ Environment: GNU CC/Emacs
    Make system: CMake
    Debuggers: Valgrind/GDB

  6. #6
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    Quote Originally Posted by Macha View Post
    Is that another way of saying "Beginner, hahaha"
    No, that's another way of saying "in C you have to roll your own". I would guess you've got at least a couple hundred lines of C code needed just to get what's in Python out of the box in terms of the data structures.

    On the other hand C++ would have a lot more of what you're expecting to use ready to go.

  7. #7
    Registered User C_ntua's Avatar
    Join Date
    Jun 2008
    Posts
    1,853
    Quote Originally Posted by Macha View Post
    This is my first attempt at C, although I have Python experience.

    I wrote a Python program but it is too slow and I think C might help me. How do I convert it into C?
    The verdict of what tabstop said is that you should use C++. Since it is as fast (the say) as C. In the case of implementing stuff with no experience, it is faster than C++, cause you won't make them as good as the standards data types or good ones found in libraries. So google a bit, you might find what you want ready to go in C++.

    You can explain what the functions do in Python and maybe we can give you a function in C/C++.

  8. #8
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    This is an interesting problem. Sounds very doable. What is the range of the numbers in the sub sets?

    I have never tried "rolling my own batteries" but it sounds like something not to be missed. < Thanks, Jeff! >

    I'm quite sure it can be sped up, but what are the parameters of this - how many sub sets do you typically have, and what is your run-time for this part of the Python program?

    If the current run-time is 1/100th of a second, it's probably not worth optimizing in C, to get it to run in 1/500th of a second.

    While you're at it, would you put a 5 second run-time input file on Swoopshare, so I could have something to compare a C program run-time, to? Post the url to it, back here, of course, so it can be d/l'ed. That would give us a nice target to beat.

    Not that anyone here is competitive, at all. Never happen.

  9. #9
    Registered User
    Join Date
    May 2010
    Posts
    24
    Thanks for the replies.

    OK, I see. Maybe it is a bit too difficult to do it in C for me!

    The triplets consist of integers only and I may have several thousand, even hundreds of thousands of them. Each triplet intersects with an average of 3 other triplets. No triplets contain double or triple identical numbers (so (8,8,1) would never occur)

    The components of triplets themselves are unordered but the whole list of triplets is ordered (I need to keep track of their indices).

    Currently Python takes under a second for a thousand triplets, but after that it becomes a waiting game. Way waaaay to long. That's why I am looking at other languages for this.

    It is daunting to do this "from atoms". Perhaps C++ is better? I thought it was harder, but I don't know either.

    Unfortunately I haven't got a big dataset with me on this computer. If you really want to and can wait a few days I can upload one. But a small one is below. The above Python code takes 0.016 seconds for this.


    Code:
    [(0, 1, 5), (1, 2, 6), (2, 3, 7), (4, 5, 9), (5, 6, 10), (6, 7, 11), (8, 9, 13), (9, 10, 14), 
    (10, 11, 15), (16, 17, 21), (17, 18, 22), (18, 19, 23), (20, 21, 25), (21, 22, 26), 
    (22, 23, 27), (24, 25, 29), (25, 26, 30), (26, 27, 31), (28, 29, 33), (29, 30, 34), 
    (30, 31, 35), (32, 33, 37), (33, 34, 38), (34, 35, 39), (36, 37, 14), (37, 38, 13), 
    (38, 39, 12), (18, 41, 40), (17, 42, 41), (17, 16, 43), (40, 41, 45), (41, 42, 46), 
    (42, 43, 47), (44, 45, 1), (45, 46, 2), (46, 47, 3), (40, 48, 23), (44, 49, 48), (0, 4, 49), 
    (48, 50, 27), (49, 51, 50), (4, 8, 51), (50, 35, 31), (51, 39, 35), (8, 12, 39), (47, 52, 7),
     (43, 53, 52), (16, 20, 53), (52, 54, 11), (53, 55, 54), (20, 24, 55), (54, 36, 15), 
    (55, 32, 36), (24, 28, 32), (55, 24, 32), (54, 55, 36), (11, 54, 15), (53, 20, 55), 
    (52, 53, 54), (7, 52, 11), (43, 16, 53), (47, 43, 52), (3, 47, 7), (51, 8, 39), (50, 51, 35),
     (27, 50, 31), (49, 4, 51), (48, 49, 50), (23, 48, 27), (44, 0, 49), (40, 44, 48), 
    (19, 40, 23), (46, 3, 2), (45, 2, 1), (44, 1, 0), (42, 47, 46), (41, 46, 45), (40, 45, 44), 
    (17, 43, 42), (18, 17, 41), (19, 18, 40), (38, 12, 13), (37, 13, 14), (36, 14, 15),
     (34, 39, 38), (33, 38, 37), (32, 37, 36), (30, 35, 34), (29, 34, 33), (28, 33, 32), 
    (26, 31, 30), (25, 30, 29), (24, 29, 28), (22, 27, 26), (21, 26, 25), (20, 25, 24), 
    (18, 23, 22), (17, 22, 21), (16, 21, 20), (10, 15, 14), (9, 14, 13), (8, 13, 12), 
    (6, 11, 10), (5, 10, 9), (4, 9, 8), (2, 7, 6), (1, 6, 5), (0, 5, 4)]
    Last edited by Macha; 05-22-2010 at 02:44 AM.

  10. #10
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    With C++, I'm not sure what could be done, but my intuition is that this can be done quite quickly, in plain C.

    I will wait for your bigger example data file, and use this one to experiment with, in the meantime. Below 16/1000's of a second, optimization is hard to find because the the elapsed time is likely to be reported as just all zero's. The bigger file will fix that problem.

    Thanks for bringing this problem up.

  11. #11
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,660
    I copy/pasted the data sample in post #9 10 times (1080 samples in all). I don't know if that's still a valid data set, but it seemed good enough to test with whilst waiting for the upload.

    It would be interesting if others came up with the same number of results. It could be fast, but wrong.

    Give or take a bit of noise, this takes 85 milliseconds (and this is inside a VM)
    Code:
    $ ./a.out 
    Elapsed time:From=886376, To=971654, Elapsed=85278
    Num tuples=1080, num results=21060
    0 74: 0 1 5 -> 44 1 0
    0 106: 0 1 5 -> 1 6 5
    0 107: 0 1 5 -> 0 5 4
    0 108: 0 1 5 -> 0 1 5
    0 182: 0 1 5 -> 44 1 0
    0 214: 0 1 5 -> 1 6 5
    0 215: 0 1 5 -> 0 5 4
    0 216: 0 1 5 -> 0 1 5
    0 290: 0 1 5 -> 44 1 0
    0 322: 0 1 5 -> 1 6 5
    $ ./a.out 
    Elapsed time:From=335430, To=418509, Elapsed=83079
    Num tuples=1080, num results=21060
    0 74: 0 1 5 -> 44 1 0
    0 106: 0 1 5 -> 1 6 5
    0 107: 0 1 5 -> 0 5 4
    0 108: 0 1 5 -> 0 1 5
    0 182: 0 1 5 -> 44 1 0
    0 214: 0 1 5 -> 1 6 5
    0 215: 0 1 5 -> 0 5 4
    0 216: 0 1 5 -> 0 1 5
    0 290: 0 1 5 -> 44 1 0
    0 322: 0 1 5 -> 1 6 5
    $ ./a.out 
    Elapsed time:From=518591, To=605428, Elapsed=86837
    Num tuples=1080, num results=21060
    0 74: 0 1 5 -> 44 1 0
    0 106: 0 1 5 -> 1 6 5
    0 107: 0 1 5 -> 0 5 4
    0 108: 0 1 5 -> 0 1 5
    0 182: 0 1 5 -> 44 1 0
    0 214: 0 1 5 -> 1 6 5
    0 215: 0 1 5 -> 0 5 4
    0 216: 0 1 5 -> 0 1 5
    0 290: 0 1 5 -> 44 1 0
    0 322: 0 1 5 -> 1 6 5
    With -O2 optimisation, it's 11 milliseconds

    > Not that anyone here is competitive, at all. Never happen.
    Game On!
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  12. #12
    Guest Sebastiani's Avatar
    Join Date
    Aug 2001
    Location
    Waterloo, Texas
    Posts
    5,708
    Well, there are certainly more fancy ways to do this in C++ (such as substituting the array-to-vector conversion code with append-to-vector-with-proxy magic), but I figured it'd probably be best to stick with K.I.S.S., for clarity's sake:

    Code:
    #include <iostream>
    #include <vector>
    #include <iterator>
    
    /*
        WARNING: This macro will *only* work on LOCAL variables
    */
    #define ARRAY_SIZE( array ) ( sizeof( array ) / sizeof( array[ 0 ] ) )
    
    using namespace std;
    
    /*
        Quick and dirty print kludge
    */
    namespace std
    {
        template < typename Type >
        ostream& operator << ( ostream& out, vector< Type > const& values )
        {
            out << "{ ";
            for( size_t i = 0, size = values.size( ); i < size; ++i )
            {
                if( i != 0 )
                    out << ", ";
                out << values[ i ];
            }
            out << " }";
        }
    }
    
    typedef vector< int >
        tuple_int;
    typedef vector< tuple_int >
        tuple_tuple_int;
    
    tuple_int triplet( int first, int second, int third )
    {
        int
            array[ ] = { first, second, third };
        return tuple_int( array, array + ARRAY_SIZE( array ) );
    }
    
    bool condition_shared( tuple_int const& lhs, tuple_int const& rhs, size_t size )
    {
        tuple_int
            tmp;
        set_intersection( lhs.begin( ), lhs.end( ), rhs.begin( ), rhs.end( ), back_inserter( tmp ) );
        return tmp.size( ) == size;
    }
    
    tuple_tuple_int searchlist( tuple_tuple_int const& list )
    {
        tuple_tuple_int
            result;
        size_t
            size = list.size( );
        for( size_t i = 0; i < size; ++i )
        {
            for( size_t j = 0; j < size; ++j )
            {
            /*
                Sanity check
            */
                if( i == j )
                    continue;
                if( condition_shared( list[ i ], list[ j ], 2 ) )
                {
                    int array[ ] =
                    {
                        i,
                        j
                    };
                    result.push_back( tuple_int( array, array + ARRAY_SIZE( array ) ) );
                }
            }
        }
        return result;
    }
    
    int main( void )
    {
        tuple_int
            array[ ] =
        {
            triplet( 0, 1, 3 ),
            triplet( 4, 5, 7 ),
            triplet( 6, 7, 2 ),
            triplet( 5, 4, 1 ),
            triplet( 0, 2, 7 ),
            triplet( 4, 6, 3 ),
            triplet( 1, 4, 3 ),
            triplet( 5, 0, 7 ),
            triplet( 5, 1, 0 ),
            triplet( 6, 2, 3 ),
            triplet( 4, 7, 6 ),
            triplet( 0, 3, 2 )
        };
        tuple_tuple_int
            list( array, array + ARRAY_SIZE( array ) );
        cout << " - Values - " << endl;
        copy( list.begin( ), list.end( ), ostream_iterator< tuple_int >( cout, "\n" ) );
        tuple_tuple_int
            indices = searchlist( list );
        cout << " - Indices - " << endl;
        copy( indices.begin( ), indices.end( ), ostream_iterator< tuple_int >( cout, "\n" ) );
    }
    And of course, if the performance still isn't good enough, you could always work to get rid of code that generates temporaries and such (by rewriting it "by hand", that is), but my guess is that probably isn't really necessary. Then again...

    Quote Originally Posted by Adak View Post
    Not that anyone here is competitive, at all. Never happen.
    <starts rewriting code in assembly language>
    Code:
    #include <cmath>
    #include <complex>
    bool euler_flip(bool value)
    {
        return std::pow
        (
            std::complex<float>(std::exp(1.0)), 
            std::complex<float>(0, 1) 
            * std::complex<float>(std::atan(1.0)
            *(1 << (value + 2)))
        ).real() < 0;
    }

  13. #13
    Registered User Sharke's Avatar
    Join Date
    Jun 2008
    Location
    NYC
    Posts
    303
    Quote Originally Posted by jeffcobb View Post
    No, as someone who does lots of both, if you are coming to C from Python you are going to HATE C. Particularly for doing stuff like this. Not trying to discourage you but the "batteries included" aspect of Python is "roll your own batteries" in C.
    I came to C from Python and I LOVED rolling my sleeves up and getting my hands dirty. It was like discovering the joys of driving a stick shift for the first time.

  14. #14
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Quote Originally Posted by Sharke View Post
    I came to C from Python and I LOVED rolling my sleeves up and getting my hands dirty. It was like discovering the joys of driving a stick shift for the first time.
    Yeah, I was gonna make a comment about that. I came from perl.

    There are clearly people that don't like the "minimal" and apparently "feature poor" syntax of C. I think the best way to avoid ending up as such a person is to not approach a problem like the OP's in terms of translating the code by duplicating functions (although it may end up much like that anyway). I have seem someone do this previously and I think it was a bad way to learn C, all they could think about was how to create a C version of feature X, dismissing the possibility there might be a completely different way of approaching the problem which makes better use of existing features (of C).

    So that means learning the basic syntax, then considering the code you want to "translate" in terms of what it does, not how it does it. Then forget the python code and try and implement in C from scratch. Like I said, this may end up the same way as just duplicating python functions, but it also may end up very different. I regularly re-write stuff I've already done in perl in C, and sometimes use perl to prototype an idea because it is much faster to code in. But there are often little "short cuts" in the lower level world that are not apparent from the higher one (eg, the significance of memory addressing) and it's there that the magic happens.

    A major factor in this is that it is awkward and "unnatural" to work from a byte literal perspective in high level, dynamically typed language like perl or python. Whereas in C, it's totally the foundation of everything.* What seems to be a consequence of a lack of high level features is a feature/opportunity itself.

    * the inverse of this is that regular expressions are very fundamental in the interpreted languages whereas they seem awkward or even pointless in C.
    Last edited by MK27; 05-22-2010 at 11:07 AM.
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

  15. #15
    Registered User jeffcobb's Avatar
    Join Date
    Dec 2009
    Location
    Henderson, NV
    Posts
    875
    Quote Originally Posted by Sharke View Post
    I came to C from Python and I LOVED rolling my sleeves up and getting my hands dirty. It was like discovering the joys of driving a stick shift for the first time.
    <drift>
    Ironically enough, I still drive a stick and when I bought a new car for the new job last month or so, I searched and searched until I could find a new car with a stick. Getting harder every year. But I do know what you mean; my "first" language was whatever monstrosity of BASIC was on the Ti/99A back in the day but when I quickly hit the limits of the language (and memory of the machine of the day) I went to assembly and got the same feeling you describe.

    <sorta back on subject>
    By "rolling your own batteries" I was playing off of the Python mantra of "batteries included" because 99% of the time if you need a class/function to do something, nearly anything, it is already in the language or more precisely in one of the libraries. With C anything not accomplished by one of the keywords or stdlibs, you have to make it yourself. And remember: the one of the points of C is to do stuff almost as efficiently as possible; with Python it is to get stuff done/get to a working model as quickly as possible. They serve different purposes. Who knows, the OP may discover he loves getting his hands dirty, so to speak for the thrill of eeking out some extra clock cycles. if so, he has chosen the right path to it (leaving out assy which really isn't the point of this forum anyhow).

    C++ may be a good choice for him b/c some of the things he is doing with data structures is already done in the STL so he may find that to be a better middle-ground between pain and pleasure...
    C/C++ Environment: GNU CC/Emacs
    Make system: CMake
    Debuggers: Valgrind/GDB

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Replies: 16
    Last Post: 11-01-2009, 12:10 AM
  2. python c api
    By ralu. in forum C Programming
    Replies: 0
    Last Post: 03-01-2009, 01:19 PM
  3. Python Embedding Help...
    By Rendered in forum C Programming
    Replies: 2
    Last Post: 11-17-2007, 10:08 AM
  4. python in c++ application
    By l2u in forum C++ Programming
    Replies: 5
    Last Post: 04-18-2007, 07:50 AM
  5. Python
    By Xterria in forum A Brief History of Cprogramming.com
    Replies: 10
    Last Post: 05-05-2002, 04:08 PM