Thread: Histogram

  1. #1
    Registered User
    Join Date
    Jul 2012
    Posts
    6

    Histogram

    Hello,
    i have a data of 10.000.000 numbers and need to get out a histogram, or better said a dataset, which gives out a histogram when directly plotted (i already took the raw data an plottet a histogram with gnuplot, was no problem, but thats not my task).
    The problem is, that im very new to C, and i couldnt find any good tutorials or sides in google.
    The histogram should be on logscale, the x axis has a range from 0.01-150 and it should be devided into bits which contains at least 10 numbers. So the width of the bits would not be constant.
    i would be thankful for any help, tips, links or part of codes.

  2. #2
    and the hat of int overfl Salem's Avatar
    Join Date
    Aug 2001
    Location
    The edge of the known universe
    Posts
    39,659
    Break the problem down into some simpler steps you feel more able to tackle.
    A development process

    For example,
    Prepare a test data file with only 100 numbers rather than 10M numbers.

    1. Read the numbers from a file and print them out to the console.

    2. Add code to store the numbers in an array, then print them out using another loop when you're done reading the file.

    3. Add code to sort the numbers before printing them out.
    If you dance barefoot on the broken glass of undefined behaviour, you've got to expect the occasional cut.
    If at first you don't succeed, try writing your phone number on the exam paper.

  3. #3
    Registered User
    Join Date
    Jul 2012
    Posts
    6
    okay, thank you. Maybe i will need help when i arrived at step 3 :-).

  4. #4
    Registered User
    Join Date
    Jul 2012
    Posts
    6
    Hello again. Now i am that far, that i get a nice histogram. But until now i have chosen a concrete binwidth. Now i want to have dynamical bins, meaning if the number of data in a bin is under some treshhold, for expample 10, the binwidth should increase.
    I would be thankful for every hints how to construct the right array, links or anything.

  5. #5
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    I have done exactly that for comercial software in the past. What I did was:
    1. Find the min and max.
    2. Predefine an array of whatever you deem to be acceptable bin sizes e.g.
      Code:
      {0.01, 0.02, 0.025, 0.05, 0.1, 0.2, 0.25, 0.5, 1, 2, 2.5, 5, 10, 20, 25, 50, 100}
    3. Find the first bin size from that array which divides the range into no more than N bins. I chose 15 for N, giving between 8 and 15 bins which felt about right.
    4. Don't forget to round the min and max to multiples of bin size when checking how many bins apart they would be.
    5. Create the histogram.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

  6. #6
    Registered User
    Join Date
    Jul 2012
    Posts
    6
    Thank you, but your answer does not help me or i did not understand you. i already made an histogram, i found the max- and minvalues, i chose 100 for N ( had a larger sample ofdata). But i want to get a minimum number of numbers in 1 bin. Take your example: if you would subdivide the data into only 10 bins of a size of 0.01-2, etc... (ignoring the 3 values higher than 20), almost all of your data woud be in bin 1 the other bins would have one or zero numbers in it.

    my idea is: if the number (j) <4 let the arraysize expand until 4 is reached...
    (i know that i will have just brider bins with the same hight for some data, but i will devide the frequency by the binwidth and give that out later.
    (sorry for my bad english :-D)

  7. #7
    Registered User
    Join Date
    Jul 2012
    Posts
    6
    Sorry, i meant binwidth instead of "arraysize"

  8. #8
    Registered User
    Join Date
    Sep 2006
    Posts
    8,868
    With problems like this, I've found it VERY helpful to take a VERY small example of the problem, and solve it with paper and pencil - (or a sample program I code up), and note the pattern I used to solve it. You may need to repeat it a few times to see it, but when you do see it, write down the steps you took to solve it. That is the "backbone" of your pseudo code right there, for your program.

    Break that "backbone" down into smaller steps suitable for the computer, and now you have your pseudo code for your program. Change that into working code, function by function, putting off the details until later in the process.

    Just off the cuff, I'm thinking of a a nested set of loops in a function, where the number of bins in total has been passed to it (and that number will change each time it's passed). Given that number of bins, the function works through the data, trying different bin widths, and recording the one most equal in distribution of data per bin.
    If the distribution goes way off track, the inner loop of the function breaks, and another width is tried. It would be a slow process, most tedious to do by hand with any large amount of data, but the computer would romp through it OK.

    Can you make a small problem set for an example?

  9. #9
    Registered User
    Join Date
    Jul 2012
    Posts
    6
    Nevermind i fixed the problem myself. Once you get the idea its always pretty easy. But thanks for answering

  10. #10
    Algorithm Dissector iMalc's Avatar
    Join Date
    Dec 2005
    Location
    New Zealand
    Posts
    6,318
    Quote Originally Posted by SpecKROELLchen View Post
    (sorry for my bad english :-D)
    Unfortunately your English bad enough that I was unable to understand what you meant. Good thing you figured it out anyway.
    My homepage
    Advice: Take only as directed - If symptoms persist, please see your debugger

    Linus Torvalds: "But it clearly is the only right way. The fact that everybody else does it some other way only means that they are wrong"

Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Histogram help
    By Fixxxer in forum C Programming
    Replies: 4
    Last Post: 11-12-2010, 02:15 AM
  2. a histogram program
    By makonikor in forum C Programming
    Replies: 33
    Last Post: 04-17-2010, 01:17 AM
  3. Hue index Histogram
    By nanang in forum C# Programming
    Replies: 2
    Last Post: 05-17-2009, 08:08 PM
  4. histogram
    By bazzano in forum C Programming
    Replies: 3
    Last Post: 04-04-2007, 12:25 PM
  5. Histogram
    By fjf314 in forum C++ Programming
    Replies: 1
    Last Post: 01-15-2004, 11:39 PM