Thread: Shortest path problem for undirected graph, with no weights per edge

1. Shortest path problem for undirected graph, with no weights per edge

In my application, I have a large graph (I am not sure that it will fit in the memory, but let's assume it will), which **I think** will have |E| > |V|.

I want to find the shortest path between two nodes. The graph is undirected and the edges have no weights. Do you happen to know anything on this topic? Any algorithm(s)/paper(s)?

//on the background
I am thinking of setting the graph (I am in the designing phase) with every node holding its linking with its neighbors data (instead for a global structure, like a list or matrix). That way, I am almost sure that searching will become more efficient than in the case with the list or matrix. You see the graph is going to be large and the goal is to perform as fast queries as possible. By holding local indexes per node of graph, I will access only the indices of the neighbors per node.

2. Some preliminary info that may lead to more good stuff. Worth reading even if it isn't exactly what you need.
Shortest path problem - Wikipedia, the free encyclopedia
Dijkstra's algorithm - Wikipedia, the free encyclopedia

3. Also, it sounds like the way you want to store your nodes is in an adjacency matrix/list. Or maybe, some partial version of that, only concerned with nodes that are close to the current node.

One more good reference, to help get you using the right language: Glossary of graph theory - Wikipedia, the free encyclopedia.

Note, if time is of little concern, but space is, you can use DFS to exhaustively search every possible path.

4. Originally Posted by anduril462
Note, if time is of little concern, but space is, you can use DFS to exhaustively search every possible path.
Why prefer DFS over BFS?

The goal is the time. The faster, the better!

As for the algorithms, I think that I should avoid those that have E in their time complexity and only check the ones left and find the "optimal" from them (the ones that have only V inside them).

5. Originally Posted by std10093
Why prefer DFS over BFS?
I'd query that too, without knowing more about the specific problem. Each has tradeoffs.

Originally Posted by std10093
As for the algorithms, I think that I should avoid those that have E in their time complexity and only check the ones left and find the "optimal" from them (the ones that have only V inside them).
I'd look a bit more carefully than that. Some of the algorithm with complexity only in terms of V have higher powers (for example, because the number of edges is a more-than-linear function of number of vertices).

6. Originally Posted by grumpy
I'd query that too, without knowing more about the specific problem. Each has tradeoffs.
Didn't I state the specific problem? What more should I state?

Originally Posted by grumpy
Some of the algorithm with complexity only in terms of V have higher powers (for example, because the number of edges is a more-than-linear function of number of vertices).
What do you mean by "higher powers"? Probably I am lost in the language here. Also the example is not clear to me too. Sorry :/

7. Originally Posted by std10093
Why prefer DFS over BFS?

The goal is the time. The faster, the better!
You made no mention of runtime needs, yet did mention that the size of the graph was large, and that it may exceed memory. That, combined with your "background" paragraph -- particularly the part about now wanting to hold the entire graph in memory -- led me to believe you were more concerned with memory than speed, which you didn't mention at all. Note, most people optimize for time nowadays, but optimizing for space is still something that people do.
Originally Posted by std10093
Didn't I state the specific problem? What more should I state?
Given that grumpy found it unclear, and I misunderstood what you were getting at, it may be underspecified. Now we know you're optimizing for speed. But we don't know how much bigger |E| is than |V|.
Originally Posted by std10093
As for the algorithms, I think that I should avoid those that have E in their time complexity and only check the ones left and find the "optimal" from them (the ones that have only V inside them).
Originally Posted by std10093
What do you mean by "higher powers"? Probably I am lost in the language here. Also the example is not clear to me too. Sorry :/
By higher power, grumpy means "order" or "exponent". What grumpy is saying is that, you may find an algorithm where there is only |V| in the time complexity, but it may still be slower than one with |V| and |E|, even if |E| > |V|. For example, assume |V| is large (millions of nodes) and |E| is on the order of 100 |V|.
Now, imagine you have two algorithms, one with just |V| in the time complexity:
O(|V|3)
and another with both |E| and |V| in it
O(|E| * |V|)
Which is faster in this case? The second one. The better algorithm will depend on just how much bigger |E| is than |V|. In the above, as long as |E| < |V|2 (which I think it must be -- you cannot have |V|2 edges in an undirected graph), the second algorithm is faster. This is one area where you could have specified better: just how big is |V| and how much bigger is |E| than |V|?

8. I wanted to emphasize that the graph is big, that's it why I stated the "does not fit in memory" case. However, I cause confusion. Moreover, the people that make the rules, are not clear in this part (if the graph fits in memory or not, but they tend to let as assume it will fit in the memory). As a result, let's take as granted that the graph fits in the memory.

Of course considering the memory is important too, but since speed is the "success criterion" here, I want to focus on that first. Then if I see that I have memory problems, I will consider the tradeoffs and make a desicion.

>how much bigger is |E| than |V|?
I wish I knew. The graph is supposed to model a social network. But there are no details for that aspect. I feel that E is surely bigger than V, but I do not know how much.

I understood the example now, thanks.

>|E| < |V|^2 (which I think it must be -- you cannot have |V|^2 edges in an undirected graph)
This is a key aspect here.
"The maximum number of edges in an undirected graph without a self-loop is n(n - 1)/2." from wiki
That means I must start checking all the algorithms, whether they contain E or V in their time complexity.

9. Originally Posted by std10093
>how much bigger is |E| than |V|?
I wish I knew. The graph is supposed to model a social network. But there are no details for that aspect. I feel that E is surely bigger than V, but I do not know how much.

I understood the example now, thanks.
Good thing there are several social network sites, for which you could search for statistics on number of connections. Note, I would wager on the order of a few hundred: look up "Dunbar's number" or "monkey sphere".
Originally Posted by std10093
>|E| < |V|^2 (which I think it must be -- you cannot have |V|^2 edges in an undirected graph)
This is a key aspect here.
"The maximum number of edges in an undirected graph without a self-loop is n(n - 1)/2." from wiki
That means I must start checking all the algorithms, whether they contain E or V in their time complexity.
There's really only a handful of algorithms to consider here, and their time and space complexities are well known, so it's not difficult to evaluate all of them. Once you find out a reasonable distribution for number of connections, you can make a good guess

10. >Dunbar's number
great info

11. What's this E and V thing? Are we talking about Lagrangians here? The only time I've seen such terms is when discussion potential energy vs. energy as L = T - V.

12. Originally Posted by MutantJohn
What's this E and V thing? Are we talking about Lagrangians here? The only time I've seen such terms is when discussion potential energy vs. energy as L = T - V.
Click some of the links about graphing algorithms and theories, you'll figure it out pretty quickly .

13. Originally Posted by MutantJohn
What's this E and V thing? Are we talking about Lagrangians here? The only time I've seen such terms is when discussion potential energy vs. energy as L = T - V.
V = vertices (or nodes)
E = edges (or paths between nodes)
Did you study graph theory?

Btw, DFS vs BFS really depends on the layout of the graph. Without knowing how the graph looks like, there's just no way to tell which one will be faster.
But remember that BFS generally uses more memory than DFS. DFS can search seemingly unbounded graphs, while BFS will struggle.

14. Originally Posted by Elysia
Btw, DFS vs BFS really depends on the layout of the graph.
Undirected graph with no weights. |E| is greater than |V|, but I do not know how much. That's all I know. However, if someone wants to ask something, he or she is welcome.

15. Well, that's not the problem (I'm after). The problem is where the solution is.
Assume a tree from top-down perspective. If the solution is far to the right in the tree, then BFS might be faster. If the solution is deep down in the left part of the tree, DFS might be faster.
The more edges you have near the root, the more expensive DFS becomes if the solution is far right in the tree.

You should try analysing some graphs of your problem a little bit. This could help you determine which algorithm is likely to be faster.

Popular pages Recent additions