Thread: Problem to find specific letters

  1. #1
    Registered User
    Join Date
    Nov 2017
    Posts
    4

    Problem to find specific letters

    Hi,
    I need to find words in a specific file, but the word may be separated with some spacing, example:


    My keyword is bomb:


    the text in my file is:
    osdfosdjfsdf fdjk adoj faejfkj vdfjjfdkjsdfkgjsfdpg fjgjsfpkdgjps jegfesjfgjdsfpkgjsdf dsjfgjsdfpgj baormdsfkpg ppjerwerwe bqwosdmerb


    In this example there are 2 cases:


    Case 1:
    Spacing 1.
    baormdsfkpg - b.o.m
    But isn't the right spacing


    Case 2:
    Spacing 2
    bqwosdmerb - b..o..m..b
    Is the right spacing


    But, I have a big problem, my file is too big, so I can't pass to the memory, that is, I need to do all I need in the own file.


    I want to know if there is a way to 'jump' (with a specific spacing) the letters or ideas to doing this.


    Note: I thought use fread(&ch, sizeof(char), i, fp); where i would be my spacing.

    Thank you.

  2. #2
    Programming Wraith GReaper's Avatar
    Join Date
    Apr 2009
    Location
    Greece
    Posts
    2,738
    You don't have to read the whole file into memory. Just read into a buffer of fixed size, and use characters from there until it empties. For example:
    Code:
    char buffer[MAX_BUFFER_SIZE];
    size_t bufferSize;
    ...
    for (;;) {
        bufferSize = fread(buffer, 1, MAX_BUFFER_SIZE, fp);
        for (i = 0; i < bufferSize; ++i) {
            ...
        }
    
        if (bufferSize != MAX_BUFFER_SIZE) {
            break;
        }
    }
    Last edited by GReaper; 11-24-2017 at 03:33 PM.
    Devoted my life to programming...

  3. #3
    Banned
    Join Date
    Aug 2017
    Posts
    861
    does not or would not this get only one at a time out of a file?
    Code:
      ch = fgetc(fp)
    so one might end up with results that should look something like this?
    Code:
     prg name                      file name    search word
    $ ./find_key_words_using_letters searchwords bomb
    searchWord: b count: 0
    searchWord: o count: 1
    searchWord: m count: 2
    searchWord: b count: 3
    Found [ bomb ] -> bomb
    as far a spacing between the letters you're looking for. All you just need are a pair of ( nested ) loops and treat the search word like an array utilizing its properties to use for comparison. you already know the line of attack is going to be straightforward. find first letter, then second letter, then third letter ect ... until your get a complete match or not depending on if that word is in the roll of letters or not.

    if you try "jumping" letters don't you think you might just miss the one you need? Would it not be best to you look at each of them one at a time. and compare them to each letter in the search word before moving on to the next letter if the previous letter is found, and keep repeating until you get to the end of the search word then report back the results?
    Last edited by userxbw; 11-24-2017 at 04:28 PM.

  4. #4
    Registered User
    Join Date
    Nov 2017
    Posts
    4
    Thank you for replying, GReaper
    I don't know, because (I'm gonna try explain) the file is about a conversation (something like a chat), I found out that there are 100 messages and in each one may contain more than 1200000 characters, (what I can try is to get the number of characters - for each message - and then use in my buffer, but I really don't like to use a variabe value in something that would be value fixed).

  5. #5
    Registered User
    Join Date
    Nov 2017
    Posts
    4
    Thank you for replying, userxbw
    I know that I need to use a pair of (nested) loops, but I can't see how I'm gonna make this. Because my program need to test all possible spacings (the spacing's max size will be the size of each message) so somehow I need necessarily "jump" the letters, because at the end the program gonna show a ranking (something like top 5) with the best results (based on the spacing) and how many times that keyword appears.

  6. #6
    Banned
    Join Date
    Aug 2017
    Posts
    861
    search "chat messages" looking for key words within it? why letter by letter? word by word is more logical.
    how many times that keyword appears.
    so would you not need to be looking for the spaces to get the word then compare it instead of using it letter by letter?

    is this an assignment for homework? if yes, Can you post it verbatim (word for word)? Because now it is making less sense to me than before.
    Last edited by userxbw; 11-25-2017 at 09:19 AM.

  7. #7
    Registered User
    Join Date
    Nov 2017
    Posts
    4
    userxbw, sorry but it's confusing, mainly to explain, but here is the wording (hope it helps):

    Suppose a criminal organization uses the internet to exchange messages between its elements. However, to prevent these messages from being perceived by the elements of the security forces, this organization uses a basic method of coding, referred to as "Equidistant Sequences of Symbols" (ESS).

    In this work, you will have access to a message database of said criminal organization ("msgs.bin"), stored in a binary file with 8-bit integers corresponding to the characters from ASCII code. Each message is terminated by the numeric code "10" (new line). The message ID will correspond to the order in which they appear in this database. In addition, there will be a text file ("keywords.txt") where keywords are searched for in the message database.


    The purpose of this work is to plan and implement a system that reads the message files and keywords, and test all possible spacing, in order to find those keywords in some of the messages.
    The system output will consist of the ID of messages that potentially contain some of the keywords, sorted by the number of keywords encountered, when read with a given spacing.

    Example Output:

    Msg 7, "Bomb", "Attack" (48) // Indicates that message "7" refers to "Bomb" and "Attack" if read with spacing 48

    Note: after all I don't need print how many times the word appears.

    example_msgs_file.txt

  8. #8
    Banned
    Join Date
    Aug 2017
    Posts
    861
    I kind of started to figure it had to be something like that. so you'd have to know the key to the whatever it is called, then use that to find the letters in the message.

    say it is every 4th letter, then the key would be to check every 4th letter to find the message someone is trying to pass on. just search the entire file picking out every 4th letter then build that into your output then at the end of the file print your output to see what the message might be.

    count each char when it hits the magic number store that char in an array of x-amount size then print out the array at EOF of file.
    if you can only read in x amount of data due to memory constraints I'd pick an even number say 4 bytes at a time, or work it so that if it is an even number of spaces you are looking for set it up to use even intake, if it is an odd number of spaces then set it up to take in an odd number. it might make the search go easier.

    let me chew on this one for a minute or two. two files,

    one has "keywords" to search for
    the file to be searched that is terminated by 10 ( end line) and it is a bin (binary) file.
    sounds simple enough.

    just need to know what "spaces" to go to before looking at the letter and saving it.
    I'd use main(int argc, char **argv) to get the number of spaces to use and files to open
    grab the word to look for, then in a loop take that word, loop it count that magic number then check it.
    if match off first letter keep it, mark it move to next letter in sequence until search word or end line is had.
    then evaluate it report it or store that information.

    then move to the next word in that keywords to look for file repeat.

    then in the EOF of the keywords to look for report my findings.

    but their has to be some logic behind it. the two words have to be in order or no?

    attack bomb within that file of letters or bomb attack which words to look for first then second, or is it random? if one can pick out both them words as long as the number spaces get the letters and the letters then put together add up to the two words bomb attack. if that last then you have to grab every x-letter, then try to make sense out of that to see if it adds up to bomb attack or not.
    Last edited by userxbw; 11-25-2017 at 03:03 PM.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Making a non-specific find function
    By GypsyV3nom in forum C++ Programming
    Replies: 2
    Last Post: 08-10-2016, 03:33 PM
  2. Program to find combination of groups and letters
    By Lukáš Riedel in forum C Programming
    Replies: 3
    Last Post: 12-31-2015, 09:30 AM
  3. Replies: 9
    Last Post: 02-01-2015, 09:18 AM
  4. Find word with uppercase and lowercase letters
    By doia in forum C Programming
    Replies: 9
    Last Post: 07-15-2010, 08:51 PM
  5. Find specific text.
    By mmarab in forum C Programming
    Replies: 2
    Last Post: 05-30-2008, 02:34 AM

Tags for this Thread