Thread: need advice/hints for assignment, please help!!

  1. #1
    Registered User
    Join Date
    Oct 2006
    Posts
    1

    need advice/hints for assignment, please help!!

    Hi Everyone,

    I know I'm not supposed to post assignments and expect people to do them, and that's not what I'm looking for I was just hoping that someone could outline what I need to do to approach this assignment. I've never done any programming, before, attempting HTML was as far as I got, in fact I gave that up cause I found it confusing. Anyways, I hope some's able/willing to help me outline what I need to do...

    The assignment is as follows, and I know I need to use Arrays, if statements, scanf, printf, loops. However, I don't know how to make things like the tab, space and return key get ignored, or how to make the X key do what the return key would normally do. I really could use a basic outline of what I need to do to get this assignment done.

    ~~~~~
    Outline:
    In this assignment you will be writing a program to detect genes in prokaryotes. In working on this assignment, you will be using loops and arrays.

    DNA (Deoxyribonucleic acid) is a class of molecules that contain instructions for the construction of proteins. These proteins, in turn, are essential parts of all living organismsm and participate in every process within cells. The process of translating a sequence of nucleoties into a functional protein is essentially the same as that of translating a C source code file into an functional program. This translation process is performed by another molecule called RNA polymerase (this molecule is a biological compiler). In order to find the code for a particular protein within a very long strand of DNA, the RNA polymerase molecule recognizes a particular sequence in the DNA called a promoter.

    (Help, I don't understand biology!) In practical terms, your program will read in a sequence of letters made up of A,C,G, and Ts only. The letters stand for particular neucleoties adenine, cytosine, guanine and thymine that make up DNA. Your program will look for the pattern TTGACA in the sequence of letters. It will also look for the pattern TATAAT. If it finds both of these elements separated by exactly 19 other letters it will print "Gene found!", if they are not found it should print "No Gene found!".

    Start:
    Your program will start by printing the message: Please enter your DNA sequence (X to mark the end of the sequence).

    Input:
    Next the user will be permitted to enter the DNA sequence. The user can hit any keys on the keyboard. In response to these keys, your program should perform as follows: A,C,G,T Your program should add this letter to an array of characters (i.e. a string) stored inside your program.
    space, tab, enter, return Your program should ignore these characters and just keep right on processing
    X Your program should stop adding any more letters to its array and print out the results.
    any other characters Your program should print an error message and exit with a return value of -1 (not zero!).

    Scope:
    The maximum length of a DNA sequence allowed will be 512 letters. If a user enters more than 512 letters (not counting spaces, tabs, enters, returns), the your program should print an error message and exit with a return value of -2 (not zero!). This means that you can declare your variable as an array of 512 char.

    Operating Hints:
    The easiest way to implement this program is to start by writing a program that only reads in the input. I would use scanf with %c to read one letter at a time. A few if statements (this time with an else) can be used to identify the kind of letter entered. You can pass the string around as a parameter put the good letters inside it. Once you have this part done, I would just print out the string (without spaces, tabs, etc.). The string should match the string that the user entered.

    Searching for the pattens:
    There are two patterns you need to search for: TTGACA and TATAAT separated by exactly 19 other symbols. You must write the function to search for these patterns yourself (you may not use a built-in search function or the strcmp function, etc.). The easiest way to do this is probably to write a function that accepts a parameter called position. It can then look for a "T" at position, another "T" at postion+1, a "G" at position+2, and so on. Then you can call that function from within a loop for each possible position. Watch that you don't fall off the end of the string. If you do not understand what that sentence there just meant, you need to understand it before you start the assignment.

    Testing:
    Make sure you test your program very carefully with a number of examples. Some things to consider: (a) what if the pattern is right at the beginning of the string, (b) what if the pattern is right at the end, (c) what if the string is exactly 512 characters long (and has the pattern right at the end), (d) what if the string doesn't have the pattern, (e) what if the string is less than 31 symbols long.
    ~~~~~

    This what I have so far, I know it's not what it's supposed to be cause it doesn't work properly, but I was hoping someone could point me in the right direction and give me some hints on what i need to do. There's two pieces of code, relatively the same, neither work though.
    Code:
    #include <stdio.h>
    
    int main()
      {
      char sequence[512];
      int i, p, t;
    
      printf("Please enter your DNA sequence (X to mark the end of the sequence).\n");
      for(i=0;i<512;i++ )
        {
        scanf( "%c", &sequence[i]);
        if(sequence[i] == 'A' )
          {
          sequence[i] = 'A';
          }
        if(sequence[i] == 'C' )
          {
          sequence[i] = 'C';
          }
        if(sequence[i] == 'G' )
          {
          sequence[i] = 'G';
          }
        if(sequence[i] == 'T' )
          {
          sequence[i] = 'T';
          }
        if((sequence[i] == 9) || (sequence[i] == 13) || (sequence[i] == 32))
          {
          i = i - 1;
          }
        if(sequence[i] == 'X' )
          {
          if(i<19)
            {
            printf("The DNA sequence to have entered is too short for Analysis.\n");
            return 0;
            }
            if(i>19)
              {
              if(i<=512)
                {
                p = i;
                printf("The DNA sequence you entered is as follows:\n");  
                for(i=0;i<p;i++)
                  {
                  printf("%c", sequence[i]);
                  }
                }
              if(i>512)
                {
                return -2;
                }
              }
          }
        /*
        if((sequence[i] != 'A') || (sequence[i] != 'G') || (sequence[i] != 'C') ||
           (sequence[i] != 'T') || (sequence[i] != 9) || (sequence[i] != 13) ||
           (sequence[i] != 32) || (sequence[i] != 'X'))
          {
          printf("You have entered an invalid DNA sequence.\n");
          return -1;
          }
        */
        }
      t = p;
      for(i=0;i<=t;i++)
        {
        if((sequence [i]== 'T') && (sequence [i+1]== 'T') && (sequence [i+2]== 'G') &&
           (sequence [i+3]== 'A') && (sequence [i+4]== 'C') && (sequence [i+5]== 'A') &&
           (sequence [i+25]== 'T') && (sequence [i+26]== 'A') && (sequence [i+27]== 'T') &&
           (sequence [i+28]== 'A') && (sequence [i+29]== 'A') && (sequence [i+30]== 'T'))
          {
          printf("Gene Found!\n");
          }
        else
          {
          printf("No Gene Found!\n");
          }
        }
    
      return 0;
      }
    Code:
    #include <stdio.h>
    
    int main()
      {
      char sequence[512];
      int i, p, t;
    
      printf("Please enter your DNA sequence (X to mark the end of the sequence).\n");
      for(i=0;i<512;i++ )
        {
        scanf( "%c", &sequence[i]);
        if(sequence[i] == 'A' )
          {
          sequence[i] = 'A';
          }
        if(sequence[i] == 'C' )
          {
          sequence[i] = 'C';
          }
        if(sequence[i] == 'G' )
          {
          sequence[i] = 'G';
          }
        if(sequence[i] == 'T' )
          {
          sequence[i] = 'T';
          }
        if((sequence[i] == 9) || (sequence[i] == 13) || (sequence[i] == 32))
          {
          i = i - 1;
          }
        if(sequence[i] == 'X' )
          {
          if(i<19)
            {
            printf("The DNA sequence to have entered is too short for Analysis.\n");
            return 0;
            }
            if(i>19)
              {
              if(i<=512)
                {
                p = i;
                printf("The DNA sequence you entered is as follows:\n");  
                for(i=0;i<p;i++)
                  {
                  printf("%c", sequence[i]);
                  t = p;
                  }
                for(i=0;i<=t;i++)
                  {
                  if(sequence [i]== 'T')
                    {
                    if(sequence [i+1]== 'T') 
                      {
                      if(sequence [i+2]== 'G') 
                        {
                        if(sequence [i+3]== 'A') 
                          {
                          if(sequence [i+4]== 'C') 
                            {
                            if(sequence [i+5]== 'A') 
                              {
                              if(sequence [i+25]== 'T') 
                                {
                                if(sequence [i+26]== 'A') 
                                  {
                                  if(sequence [i+27]== 'T') 
                                    {
                                    if(sequence [i+28]== 'A')
                                      {
                                      if(sequence [i+29]== 'A') 
                                        {
                                        if(sequence [i+30]== 'T')
                                          {
                                          printf("\nGene Found!\n");
                                          return 0;
                                          }
                                        }
                                      }
                                    }
                                  }
                                }
                              }
                            }
                          }
                        }
                      }
                    }
                  else
                    {
                    printf("\nNo Gene Found!\n");
                    return 0;
                    }
                  }
                }
              }
              if(i>512)
                {
                return -2;
                }
          }
        }
        /*
        if((sequence[i] != 'A') || (sequence[i] != 'G') || (sequence[i] != 'C') ||
           (sequence[i] != 'T') || (sequence[i] != 9) || (sequence[i] != 13) ||
           (sequence[i] != 32) || (sequence[i] != 'X'))
          {
          printf("You have entered an invalid DNA sequence\n");
          return -1;
          }
        */
      
      return 0;
      }
    Last edited by Salem; 10-22-2006 at 01:03 AM. Reason: detabbed the code, and folded long lines for readability

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,661
    > (you may not use a built-in search function or the strcmp function, etc.).
    So write your own, with exactly the same interface.

    You might write this whilst you're testing your ideas out
    Code:
    if ( strncmp( &sequence[pos], "TTGACA", 6 ) == 0 &&
         strncmp( &sequence[pos+19], "TATAAT", 6 ) == 0 ) {
      // found it
    }
    But the thing you end up handing in would be this
    Code:
    if ( matchGene( &sequence[pos], "TTGACA", 6 ) == 0 &&
         matchGene( &sequence[pos+19], "TATAAT", 6 ) == 0 ) {
      // found it
    }
    In fact, here's the first step
    Code:
    int matchGene( const char *gene, const char *subgene, int sublen ) {
      return strncmp( gene, subgene, sublen );
    }
    Then all you do prior to handing it in is rewrite strncmp() in your own words and make sure everything still works!

    You also need to create some other functions to do some of the work. Having everything in main just complicates it all.

    For example, the first step is to make sure you can read in a sequence and just print it straight back to the user without doing anything to it. Until that works reliably, there isn't much point working on the other issues.
    int inputSequence( char *sequence, int maxlen );
    void printSequence( char *sequence, int len );

    > There are two patterns you need to search for: TTGACA and TATAAT separated by exactly 19 other symbols
    I imagine they'd want later on
    CATTTG and AAGGCT separated by 12 symbols to be an option as well, so the function you write should have say
    - the sequence to be searched
    - the two sub-sequences to be matched
    - the distance between them
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Buggy C code
    By Alienchicken in forum C Programming
    Replies: 8
    Last Post: 03-15-2009, 07:34 PM