Thread: from perl to C

  1. #1
    Registered User
    Join Date
    Jan 2009
    Posts
    2

    from perl to C

    Dear all,
    I am a newbie to C. I want to learn C very badly. i know the theory part of C, but i m not so good when it comes to writting real time programs in C. I m good at perl, but when i use the below program for a small file, it works fine. Not the same for a 10 gig file. So, i felt that writing the program in C would help. Please help me solve the problem, n also to learn C. Here is my problem.

    I am comparing 2 files, i take information of left and right values from file 2, and extract numbers at the beginning(left value) and end(right value) of every string-numbers (based on their unique ID which starts with '>') from file 1.
    Code:
    file1:
    
    >AAAT3R length=110
    40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 
    40 40 40 40 40 40 40 40 40 40 40 40 40 40 38 38 38 38 40 40 39 40 
    40 40 40 40 40 40 40 40 40 38 38 37 39 36 36 40 36 35 35 35 38 40 
    35 35 33 35 35 35 40 40 40 40 37 37 38 38 38 40 40 40 40 40 40 40 
    40 40 40 40 40 40 40 40 40 37 36 36 31 22 22 22 20 20 20 20 20 14
    >AAA2OJ length=70
    18 18 18 21 35 35 35 32 32 32 33 35 38 39 37 37 39 39 39 39 39 40 
    40 39 39 39 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 
    40 39 39 37 35 35 39 37 37 37 37 37 37 37 37 37 37 33 32 32 30 20 
    17 17 17 0
    
    file2:
    
    >AAAT3R_left length=6
    TACATA
    >AAAT3R_right length=62 ACTACTGATTTGATTATCTTTGATCTCTGTCGAACTAACTATATCTTAGTATGATCTTTAAT
    >AAA2OJ_left length=14
    TTTTGGACTATCTG
    >AAA2OJ_right length=14
    AGGCTGTTCTTTTN
    
    result file(expected)
    
    >AAAT3R_left length=6
    40 40 40 40 40 40
    >AAAT3R_right length=62
    40 40 40 38 38 37 39 36 36 40 36 35 35 35 38 40 35 35 33 35 35 35 40 40
     40 40 37 37 38 38 38 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 37
     36 36 31 22 22 22 20 20 20 20 20 14 >AAA2OJ_left length=14
    18 18 18 21 35 35 35 32 32 32 33 35 38 39
    >AAA2OJ_right length=14
    37 37 37 37 37 33 32 32 30 20 17 17 17 0
    [download]
    
    This is the code i have written so far, to get the desired output.
    
    #!/usr/bin/perl -w
    use strict;
    
    our ($File1, $File2) = qw/file1 file2/;
    open File1 or die "$File1: $!\n";
    open File2 or die "$File2: $!\n";
    
    my ($key, %results);
    while (<File1>){
        next if /^\s*$/;
        chomp;
        if (/^>\s*(\S+)/){
            $key = $1;
        }
        else {
            $results{$key} = [ split ];
        }
    }
    close File1;
    
    my ($len, $side, $str);
    while (<File2>){
        next if /^\s*$/;
        if (/^>([^_]+)_(left|right).*?(\d+)\s*$/){
            print;
            $str = $1;
            $side = $2;
            $len = $3;
        }
        else {
            my @list;
            @list = @{$results{$str}};
            if ($side eq 'left'){
                die "$str is too short for a left slice of $len!\n"
                unless @list >= $len;
                print "@list[0..$len-1]\n";
            }
            else {
                die "$str is too short for a right slice of $len!\n"
                unless @list >= $len;
                print "@list[@list-$len..$#list]\n";
            }
        }
    }
    close File2;
    Please help me do this in C.
    Last edited by Salem; 01-04-2009 at 11:38 PM. Reason: Fold long lines, indented perl

  2. #2
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    Quote Originally Posted by prettydainty View Post
    Dear all,
    I am a newbie to C. I want to learn C very badly. i know the theory part of C, but i m not so good when it comes to writting
    real time programs in C. I m good at perl, but when i use the below program for a small file, it works fine. Not the same
    for a 10 gig file. So, i felt that writing the program in C would help. Please help me solve the problem, n also to
    learn C. Here is my problem.

    I am comparing 2 files, i take information of left and right values from file 2, and extract
    numbers at the beginning(left value)
    and end(right value) of every string-numbers (based on their unique ID which starts with '>') from file 1.
    Code:
    file1:
    
    >AAAT3R length=110
    40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40
    38 38 38 38 40 40 39 40 40 40 40 40 40 40 40 40 40 38 38 37 39 36 36 40 36 35 35 35 38 40 35 35 33 35 35 35
    40 40 40 40 37 37 38 38 38 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 37 36 36 31 22 22 22 20 20 20 20
    20 14
    >AAA2OJ length=70
    18 18 18 21 35 35 35 32 32 32 33 35 38 39 37 37 39 39 39 39 39 40 40 39 39 39 40 40 40 40 40 40 40 40 40 40
    40 40 40 40 40 40 40 40 40 39 39 37 35 35 39 37 37 37 37 37 37 37 37 37 37 33 32 32 30 20 17 17 17 0
    
    file2:
    
    >AAAT3R_left length=6
    TACATA
    >AAAT3R_right length=62 ACTACTGATTTGATTATCTTTGATCTCTGTCGAACTAACTATATCTTAGTATGATCTTTAAT
    >AAA2OJ_left length=14
    TTTTGGACTATCTG
    >AAA2OJ_right length=14
    AGGCTGTTCTTTTN
    
    result file(expected)
    
    >AAAT3R_left length=6
    40 40 40 40 40 40
    >AAAT3R_right length=62
    40 40 40 38 38 37 39 36 36 40 36 35 35 35 38 40 35 35 33 35 35 35 40 4 +0 40 40 37 37 38 38 38 40 40 40 
    40 40 40 40 40 40 40 40 40 40 40 40 40 37 36 36 31 22 22 22 20 20 20 20 20 14 
    >AAA2OJ_left length=14
    18 18 18 21 35 35 35 32 32 32 33 35 38 39
    >AAA2OJ_right length=14
    37 37 37 37 37 33 32 32 30 20 17 17 17 0
    [download]
    
    This is the code i have written so far, to get the desired output.
    
    #!/usr/bin/perl -w
    use strict;
    
    our ($File1, $File2) = qw/file1 file2/;
    open File1 or die "$File1: $!\n";
    open File2 or die "$File2: $!\n";
    
    my ($key, %results);
    while (<File1>){
    next if /^\s*$/;
    chomp;
    if (/^>\s*(\S+)/){
    $key = $1;
    }
    else {
    $results{$key} = [ split ];
    }
    }
    close File1;
    
    my ($len, $side, $str);
    while (<File2>){
    next if /^\s*$/;
    if (/^>([^_]+)_(left|right).*?(\d+)\s*$/){
    print;
    $str = $1;
    $side = $2;
    $len = $3;
    }
    else {
    my @list;
    @list = @{$results{$str}};
    if ($side eq 'left'){
    die "$str is too short for a left slice of $len!\n"
    unless @list >= $len;
    print "@list[0..$len-1]\n";
    }
    else {
    die "$str is too short for a right slice of $len!\n"
    unless @list >= $len;
    print "@list[@list-$len..$#list]\n";
    }
    }
    }
    close File2;
    Please help me do this in C.
    Well, I don't know stink about Perl, so ... might be quite helpful if you could elaborate on your selection criteria
    for the numbers you extract from the two files. I'm not clear on that, at least.

    I also strongly suggest you edit your post, as I have done here to your post, so that
    the program no longer "breaks the forum tables" (page width).

    It is very annoying to have to constantly scroll to Timbuktu and back, to read your post!

    And welcome to the forum!
    Last edited by Adak; 01-04-2009 at 10:10 PM.

  3. #3
    Registered User
    Join Date
    Jan 2009
    Posts
    2

    from perl to C++ again

    Dear all,
    I am comparing 2 files, i take information of left and right
    values from file 2, and extract numbers at the beginning
    (left value) and end(right value) of every string-numbers
    (based on their unique ID which starts with '>') from file 1.

    - Both the files are compared with their unique id starting with >.
    - The numbers have to be split based on Length mentioned in file2
    depending on left(split at the begining) or right(split at the end).


    Code:
    file1:
    
    >AAAT3R length=110
    40 40 40 40 40 40 40 40 40 40 40 40 40 
    40 40 40 40 40 40 40 40 40 40 40 40 40 
    40 40 40 40 40 40 40 40 40 40 38 38 38 
    38 40 40 39 40 40 40 40 40 40 40 40 40 
    40 38 38 37 39 36 36 40 36 35 35 35 38 
    40 35 35 33 35 35 35 40 40 40 40 37 37 
    38 38 38 40 40 40 40 40 40 40 40 40 40 
    40 40 40 40 40 40 37 36 36 31 22 22 22 
    20 20 20 20 20 14
    >AAA2OJ length=70
    18 18 18 21 35 35 35 32 32 32 33 35 38 
    39 37 37 39 39 39 39 39 40 40 39 39 39 
    40 40 40 40 40 40 40 40 40 40 40 40 40 
    40 40 40 40 40 40 39 39 37 35 35 39 37 
    37 37 37 37 37 37 37 37 37 33 32 32 30 
    20 17 17 17 0
    
    file2:
    
    >AAAT3R_left length=6
    TACATA
    >AAAT3R_right length=62
    ACTACTGATTTGATTATCTTTGATCTCTGTC
    GAACTAACTATATCTTAGTATGATCTTTAAT
    >AAA2OJ_left length=14
    TTTTGGACTATCTG
    >AAA2OJ_right length=14
    AGGCTGTTCTTTTN
    
    result file(expected)
    
    >AAAT3R_left length=6
    40 40 40 40 40 40
    >AAAT3R_right length=62
    40 40 40 38 38 37 39 36 36 40 36 35 35 35 38 40
    35 35 33 35 35 35 40 40 40 40 37 37 38 38 38 40
    40 40 40 40 40 40 40 40 40 40 40 40 40 40 40 37
    36 36 31 22 22 22 20 20 20 20 20 14 
    >AAA2OJ_left length=14
    18 18 18 21 35 35 35 32 32 32 33 35 38 39
    >AAA2OJ_right length=14
    37 37 37 37 37 33 32 32 30 20 17 17 17 0
    [download]
    Please help me do in C++.

  4. #4
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    I understand what you want, but now you say you need it done in C++, and I don't program in C++.

    Why don't you post this in the C++ forum, (which is also quite active), on this same website? Just click on "General Programming Boards" way up at the top of this page (in red), and the C++ forum will be on the top of the page you are sent to.

    To avoid cross posting (surely, the forum admin won't like that), you can ask to have this thread, moved. (seems to have been done once, oddly enough).

    Good luck!

  5. #5
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Dunno Adak, first post says C, second post says C++.
    All rather vague to make a call.

    IMO, prettydainty should read the "how to optimise perl" information which is available in the books / on the web.

    In particular, benchmark this first:
    Code:
    #!/usr/bin/perl -w
    use strict;
    
    our ($File1, $File2) = qw/file1 file2/;
    open File1 or die "$File1: $!\n";
    open File2 or die "$File2: $!\n";
    
    my $c1 = 0;
    my $c2 = 0;
    while (<File1>){
      $c1++;
    }
    close File1;
    
    while (<File2>){
      $c2++;
    }
    close File2;
    
    print "$c1 $c2\n";
    Just reading the file will take time.
    Assuming you implement the algorithm in zero time, you're never going to get any better than this.

    What's more, reading the same in C with
    Code:
    while ( fgets( buff, sizeof buff, fp ) ) {
      c1++;
    }
    isn't likely to be a whole lot better. Perl after all is all about reading files, so you can be pretty sure they've nailed the performance of that part.

    If you find yourself in the situation where the benchmark takes say 20 seconds, and all your code makes it 25 seconds, then you're pretty much stuffed in terms of making it any quicker. 80% of the time is in something you can do nothing about (basic file I/O). C won't read the file much quicker, and even assuming an ideal 50% saving on the processing, you might get the total down to say 20 seconds. What you're never going to get to is say 5 seconds.

    Then there's the whole issue of how to deal with regexes and hashes in C (which has neither). You can get a PCRE library, but you would need a different approach to your results hash, or a lot of new C code.

    In short, this isn't the kind of exercise I would suggest you start learning C with.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  6. #6
    spurious conceit MK27's Avatar
    Join Date
    Jul 2008
    Location
    segmentation fault
    Posts
    8,300
    Some things to know about C if you know perl:
    • the perl interpreter is written in C
    • a perl reference (eg, $ref=\@someray;) is really a C pointer, tho there is more to know about pointers than references. You can get away programming perl without using references, but you must understand pointers to use C.
    • strings in C are accessed as arrays of characters, so in your example rather than looking for the id with if ($_ =~ /^>/) you would use if (someray[0]=='>')
    C programming resources:
    GNU C Function and Macro Index -- glibc reference manual
    The C Book -- nice online learner guide
    Current ISO draft standard
    CCAN -- new CPAN like open source library repository
    3 (different) GNU debugger tutorials: #1 -- #2 -- #3
    cpwiki -- our wiki on sourceforge

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. C structure in perl typemap
    By rotis23 in forum Linux Programming
    Replies: 1
    Last Post: 07-16-2003, 11:13 AM
  2. de facto perl book
    By rotis23 in forum Linux Programming
    Replies: 1
    Last Post: 05-22-2003, 04:43 AM
  3. perl program question
    By newbie2c in forum Tech Board
    Replies: 2
    Last Post: 02-03-2003, 10:19 AM
  4. From Perl to C
    By Heavenstrash in forum C Programming
    Replies: 4
    Last Post: 06-19-2002, 01:22 AM
  5. perl need help pls.....
    By magnum38 in forum A Brief History of Cprogramming.com
    Replies: 0
    Last Post: 12-12-2001, 10:35 PM