Thread: Handling large datasets in charts quickely

1. Handling large datasets in charts quickely

I have a large dataset(around 50000 points in a list) which has be visualized in a line chart.The size of Canvas may vary according to the size of dataset. Massive amount of points makes the drawing too slow.It also results in cluttering and overlapping of points due to plotting of several points close to each other.So visual representation of data will be unsatisfactory. How can I display subset of available points which will increase the performance? Can anybody suggest me with a suitable algorithm which will help to extract the subset of actual number of points?

2. You could try creating an image rather than drawing each point separately. Typically you would choose a size of the spot in pixels and set the respective pixels inside a bitmap to the value you want. The rest will have the color of the background. Then you display the bitmap.

I am speaking mostly algorithm-like, rather than implementation like. No matter what, you want to create this image and display it. If you have two points like (10.1, 12) and (10.3, 12) they can overlap. If or if they don't depends on the scaling you will be using.

Lets assume you want a M x M pixel image. Since you have 50k points, you will use 1 pixel = 1 point. Lets assume M=N=300. Copying 50k shouldn't take that long. Displaying a 90k image should not take time at all. So speed-wise you are OK.

Now, how to create the image. You simply have a x-y coordinate system with maximum value of x equal to M and maximum value of y equal to N. You define which is the maximum x and y value a point on your list can have. This can be found by parsing through the list or can be defined by you. Lets name it X and Y.
So now you just have to scale an X x Y image to M x N image. You can simply multiply all x values by M/X and all y values by N/Y.
The only thing left is that since you are using pixels you need integers. So you round up all the values to the nearest integer.
There is no need to check overlaps really. To check you read, compare and maybe write. To write you simply write. Reading/writing should take the same time.

The question remaining is how to create an image in C# using a 2D array. You can use Bitmap.SetPixel() but that won't be a 256 pixel image. If Color is an uint (4 bytes ARGB) then again we are talking about 50k writes and an image of 360kbytes for a 300x300 display. Of course, you can do whatever scaling you want and choose M and N on your liking. Note that you can choose M=N=100 and display the image on full-screen. The bitmap displayed will just be re-scaled.

I could just have said "create the image and then display it" and saved us both some time.

3. Thanks for quick reply I am using Line and Path to draw data points.I have tried taking average of two consecutive points so that number of points would reduce to half.But it would not be feasible since it will not show proper values.What I want is i need to process the original list of points like filtering(sampling of data) and excluding unnecessary points.for example if the points are (2,1) (4,2) (6,3) the intermediate point (4,2) can be removed.Can you please help me how can I exclude some points?

4. Originally Posted by sharmila
I have a large dataset(around 50000 points in a list) which has be visualized in a line chart.
Does this mean that your dataset is (or can be) sorted by the x-coordinate? If so, it's not too hard to filter out points on a line.
Code:
```static bool CanExcludeMiddlePoint(Point first, Point middle, Point last)
{
double slope = (last.Y - first.Y) / (double)(last.X - first.X);
return (middle.X - first.X) * slope + first.Y == middle.Y;
}```
Just run through your list of points three at a time. If you exclude one, the last point should become the middle point and the next point should become the last point.

However, drawing on an image instead of using Line and Path objects is probably your best bet.

5. I don't think you need to exclude anything. Think more practically.
You will need to read the database to do anything. Since you read it, why not just write it on an image? You would need to add the multiplication I mentioned.
So you would do read, calculate, write for every position.

You cannot compare points. Because comparing 50k points to each other, to see if the overlap would require far more reads and calculations as above!

This is called complexity. My method has complexity of O(N) where N are the elements in the database. Ignore the O() part. The things is that you need N steps. One for each points. Comparing would need more. It might need N^2 for example or something close to that if you are going to compare a point against all the others.

In other words, if you have two points the same, lets say (1, 3) and (1,3) it is easier to write them twice (effectively the second writing being pointless) rather than searching 50k points to see if the point exists again.

Concluding, just put the points in a 2D array, with appropriate transformation that I mentioned. That 2D array will be an image with the points.

Even if you create a path, a rectangle will be invalidated (so the previous path can be erased). So you won't gain anything really.

6. You have 50K points.
You have 1K of horizontal pixels to display that in.

Code:
```int pitch = 50000 / 1000;

for ( i = 0, x = 0 ; i < 50000 ; i += pitch, x++ ) plot(x, data[i] );```
Plotting more than one sample in any given x-position on the chart will just get you a lot of vertical smudging.

On the other hand, perhaps you need to average in some way the 50 samples for each x-position. If you had say data points 10,10,11,12,13,1000 and it happened to choose the 1000, that would not be representative.

7. Originally Posted by C_ntua
You cannot compare points. Because comparing 50k points to each other, to see if the overlap would require far more reads and calculations as above!
Like I said, if the data set is sorted by x-coordinate, you can compare the points in linear time. Because the data set is supposed to be viewed as a line graph, I'm assuming that there is at most one y-value for each x-coordinate. You only have to compare points in the order they come.

The nice thing about the 'create an image' method is that it will work with both sorted and unsorted data. If the data is sorted, you can draw the lines as you go. If the data is unsorted, you have to make two passes: the first pass plots all the points, and the second pass draws the lines.

In either case, I'd still sort the data first. 50k points is not that many; on my machine, sorting the list never takes longer than 70 milliseconds.

8. Hi

Thanks for the quick reply excluding intermediate point helps a lot.In fact I have a collection of points List<DataPoint> DataPoints;.Each item in the collection is having X and Y co ordinates.So the entire collection is having randomly distributed points.In this case for every three points I can exclude intermediate point which passes through same line.In the remaining list I need to exclude those points which are close to each other.However this depends upon Canvas width.Is there any more method to filter points based on any other criteria?

9. As I said, you need to do some scaling. If the points are on a MxN chart and you want them on a M'xN' chart you do

x' = x * (M' / M);
y' = y * (N' / N);

Think of having points that can have values from (-infinity, infinity). Now you want to display that graph. You cannot have a graph that has infinite length. You have to choose the max length, thus a part of the imaginary coordinate system. Which is not good. So you scale everything. You choose a maximum MxN size. If you know that you won't get x-values than 1000.0 for example that is your M. Then you apply the above to scale them. So for a value of 1000.0 displaying it on a 300x300 image you will have x=300. The maximum, which is what you want.

On the process of scaling some values will be lost since they will overlap. If you have a value of 11 and 10 and apply the above it will be
y1 = 10 (300/1000) = 3.0
y2 = 11 (300/1000) = 3.3
Now you need to round them since you want pixels that are integers. Thus you will have
y1 = 3
y2 = 3
The points where too close and they naturally overlap! If you had a different scaling they might not overlap. So you get exactly what you want.

So you can either discard y2 or simply attempt to write it again. It is simpler to write it again, but it doesn't matter really.

Is your goal to display a continuous line or just to plot the points???