Thread: Please help me make this more efficient

  1. #1
    Registered User
    Join Date
    Oct 2008
    Posts
    5

    Please help me make this more efficient

    Hello,

    I've written a function which removes any punctuation from around a word. For Example:
    Code:
    input           output
    hello!          hello
    ...foo          foo
    ?they're...  they're
    My function looks like this:
    Code:
    void cleanWord(char **word) {
    	size_t len = strlen(*word);
    	int count = 0;
    	char *copy = *word;
    	char *temp = malloc(len);
    	if(temp == NULL) {
    		fprintf(stderr, "Memory Error\n");
    		exit(EXIT_FAILURE);
    	}
    	while(*copy) {
    		if(isalnum(*copy) || (!isalnum(*copy) && isalnum(*(copy-1)) && isalnum(*(copy+1)))) {
    			temp[count++] = *copy;
    		}
    		if(count == len) {
    			increaseBuffer(&temp, &len);
    		}
    		*copy++;
    	}
    	temp[count] = '\0';
    	*word = temp;
    }
    This is fine for a few words, but I need to process in excess of 29000 words, and my code seems to be lagging.

    Could you help me make this more efficient please?

    Thank you for your time.

  2. #2
    and the Hat of Guessing tabstop's Avatar
    Join Date
    Nov 2007
    Posts
    14,336
    It might be faster to not parse the whole word, but go forwards until you find a letter, and backwards until you find a letter, and copy from start to finish.

  3. #3
    Officially An Architect brewbuck's Avatar
    Join Date
    Mar 2007
    Location
    Portland, OR
    Posts
    7,396
    Quote Originally Posted by tabstop View Post
    It might be faster to not parse the whole word, but go forwards until you find a letter, and backwards until you find a letter, and copy from start to finish.
    I agree. And the use of malloc() should probably be avoided if possible. If there's no reason not to simply replace the word in the buffer where it already sits, that's probably better.

    Obviously, the buffer will never need to be larger than it already is, since you are only removing characters, not inserting any.
    Code:
    //try
    //{
    	if (a) do { f( b); } while(1);
    	else   do { f(!b); } while(1);
    //}

  4. #4
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    There's no need for strlen(), either. The length of the word is the length of the word, so you don't need to calculate it, at all. Your program will scan it letter by letter, anyway.

    while(is an alpha letter) (may need to add an apostrophe case as well)
    copy the letter

    No increase buffer, no malloc, no long muli || this || that || something else. The word ends on a space char, a comma, a period, a newline.

    I'm not sure but I believe that isalpha can do most of this.

    I wouldn't go backwards and forwards however - just forward.

  5. #5
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Code:
    *copy++;
    is the same as
    Code:
    *(copy++);
    which is just as well since
    Code:
    (*copy)++;
    would be a bug. In this case though, you can just write
    Code:
    copy++;
    You're also not malloc-ing enough space - you forgot to add one for the null terminator! Thus you're causing increaseBuffer to be called when it shouldn't be - just to add the null!
    You shouldn't need to check for the need to increase the buffer size as you're only removing chars, never adding them.

    It is wrong to be allocating a buffer for each individual word though anyway. That's my main concern. You should either do the whole thing in-place, or put all the resulting trimmed words into a single secondary buffer, allocated to the same size as the buffer the words are coming from, at the start. If you put the whole lot into a secondary buffer then the buffer you read from should be passed as const.

    So in order to make this function fast, it involves modifying the calling code as well.
    Anything else is like trying to come up with a more efficient sprinkler design when the problem is that there is a kink in the hose leading to it.
    Last edited by iMalc; 01-23-2009 at 12:29 AM.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  6. #6
    Registered User
    Join Date
    Oct 2008
    Posts
    115
    ispunct() function could capture all the punctuations though...

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. guide me to make this code better or efficient
    By larne in forum C++ Programming
    Replies: 11
    Last Post: 01-19-2009, 05:54 AM
  2. HELP!wanting to make full screen game windowed
    By rented in forum Game Programming
    Replies: 3
    Last Post: 06-11-2004, 04:19 AM
  3. make all rule
    By duffy in forum C Programming
    Replies: 9
    Last Post: 09-11-2003, 01:05 PM
  4. Question about atheists
    By gcn_zelda in forum A Brief History of Cprogramming.com
    Replies: 160
    Last Post: 08-11-2003, 11:50 AM
  5. Replies: 6
    Last Post: 04-20-2002, 06:35 PM