Thread: randomly picking a word out of a text file

  1. #1
    Registered User
    Join Date
    Jul 2006
    Posts
    111

    randomly picking a word out of a text file

    I have basic knowledge of file in out and random functions. For a current project I would like to have a file opened, a word randomly picked out of the text file and assigned to a varriable. I can't quite figure out how to do this. Any hints/help is greatly appreciatted.

  2. #2
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Find the size of the file. fseek to a random spot. If it's a space, or "not a word" token, skip ahead or back, randomly if you like, till you hit a word, and use it. If it's not a space or "not a word token", then take the word you're on. (Back up to the start of the word, read the word into a variable.)

    There's one way to do it.


    Quzah.
    Hope is the first step on the road to disappointment.

  3. #3
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    Count the words in the file, generate a random number within those bounds, seek to that word, read it in.
    If you understand what you're doing, you're not learning anything.

  4. #4
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    Quote Originally Posted by quzah
    Find the size of the file. fseek to a random spot. If it's a space, or "not a word" token, skip ahead or back, randomly if you like, till you hit a word, and use it. If it's not a space or "not a word token", then take the word you're on. (Back up to the start of the word, read the word into a variable.)

    There's one way to do it.


    Quzah.
    But that's biased towards longer words
    If you understand what you're doing, you're not learning anything.

  5. #5
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    All words are not created equal.


    Quzah.
    Hope is the first step on the road to disappointment.

  6. #6
    pwns nooblars
    Join Date
    Oct 2005
    Location
    Portland, Or
    Posts
    1,094
    File
    Array of Strings
    Random number within array bounds
    Random word

  7. #7
    Gawking at stupidity
    Join Date
    Jul 2004
    Location
    Oregon, USA
    Posts
    3,218
    As you can see there's lots of ways and one particular choice will be better given different circumstances. Quzah's is extremely efficient and works well if the words are the same length. Mine is kind of a works-for-all-circumstances kind of thing. Wraithan's works well, but uses a lot more memory so if the file is small his method will work fine and outperform mine.
    If you understand what you're doing, you're not learning anything.

  8. #8
    Algorithm engineer
    Join Date
    Jun 2006
    Posts
    286
    quzah's way is the fastest and biased towards longer words and words with higher occurrence. itsme86's way is biased towards words with higher occurrence. It all depends on how random you want it to be and how you want random to be.

  9. #9
    pwns nooblars
    Join Date
    Oct 2005
    Location
    Portland, Or
    Posts
    1,094
    It does use a lot more memory, but even a file with 1000s of words doesn't take up much room... the .dic file that comes with crimson editor (used for spell check) is only 950ishK, less than a gig and has the majority of every day words. If we want to look at it from another perspective, yours takes more processor.

    But there are even more ways to do this... try doing a search on google or on this forum since this quesiton has been asked like a million times...

  10. #10
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    I wasn't going for efficiency, I was going for lazy. It also should be noted that the OP didn't provide any specifics with regards to bias, duplicate words, efficiency, etc.


    Quzah.
    Hope is the first step on the road to disappointment.

  11. #11
    ATH0 quzah's Avatar
    Join Date
    Oct 2001
    Posts
    14,826
    Count each word.
    Track the offset of each word.
    ... allocate list size.
    ... reallocate list size if needed.
    Seek to a random word.

    Read each word.
    ... allocate space for each.
    ... add word to list.
    ... reallocate list size if needed.
    Seek to a random spot in the list.

    Both of your methods are going to be reading every word anyway, so the first hit of reading negates eachother. We will assume they are both keeping track of the number of words, because they both have to seek in their boundry of allocated words, so that also negates eachother.

    The second method gains overhead as it allocates space for each word, in the form of increasing its memory footprint, as well as actually filling that space. The former doesn't really get much overhead here by comparison, because it's easy to do one large *alloc to hold a bunch of integers, then it is to make multiple calls to malloc for each word. They both will incur the same overhead if they have to realloc. However, the second method still falls behind here, because it continually allocates space for each word.

    Seek a word. The second recoups some of its loss here, and becomes more efficient each additional time a word is required. The first method loses each time it's required seek in file. Also, each time we have to copy from disk to whatever variable, we'll lose a bit there.



    Really what you end up with is efficiency based on how many times you need a random word. The more you need them, the better speed wise it becomes to store them all in memory. It's always a trade off. You can have a small memory footprint, but you sacrifice speed in doing so.

    [edit type=refreshing_forum_before_hitting_post]
    The above is in reply to a post or two that have since been removed. Anyway, for the OP, there's the difference between the options provided by both itsme86 and Wraithan.
    [/edit]

    Quzah.
    Hope is the first step on the road to disappointment.

  12. #12
    Registered User
    Join Date
    Jul 2006
    Posts
    111
    ok, Ill try some of these ideas out.

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. gcc link external library
    By spank in forum C Programming
    Replies: 6
    Last Post: 08-08-2007, 03:44 PM
  2. Basic text file encoder
    By Abda92 in forum C Programming
    Replies: 15
    Last Post: 05-22-2007, 01:19 PM
  3. Randomly shuffle lines of huge text file
    By veecee in forum C++ Programming
    Replies: 8
    Last Post: 06-12-2006, 07:54 PM
  4. Wrong Output
    By egomaster69 in forum C Programming
    Replies: 7
    Last Post: 01-28-2005, 06:44 PM
  5. Unknown Memory Leak in Init() Function
    By CodeHacker in forum Windows Programming
    Replies: 3
    Last Post: 07-09-2004, 09:54 AM