Like Tree6Likes

Map program flow

This is a discussion on Map program flow within the C Programming forums, part of the General Programming Boards category; Hi All, This has been driving me mad, I am sure the best option is recursion but Im struggling to ...

  1. #1
    Registered User rogster001's Avatar
    Join Date
    Aug 2006
    Location
    Liverpool UK
    Posts
    1,425

    Map program flow

    Hi All,

    This has been driving me mad, I am sure the best option is recursion but Im struggling to get the routine down.

    Im not really posting a code problem per se, its the visualisation and design of it that I have not been able to solve. That is what i am seeking advice on.

    The problem is:

    Take a given function and map the 'path(s)' of subsequent function calls

    I have given myself some test data - note recursive calls are not allowed in the actual source data so that does not have to be considered - I will catch that as a data error if it actually is found. The data is from a propietary, high level language, I am looking at disentangling the mess that the poorly documented codebase has become.

    here is the test data, I have been looking at it in excel by way of pen and paper..

    Name:  data.PNG
Views: 304
Size:  3.3 KB

    The user can select which function to map through
    So in this scenario I would select functionA in my gui and try and output everything in terms of calls, that occurs once functionA itself is run.
    Where you see the labels italiscised, eg 'D' then it means that function does not call anything else, end of that path if you like.

    So starting with row1:
    functionA calls functionB calls functionC calls functionE calls functionD

    No problem so far, but now I need to go back to the C call because as you can see the next call for the C function is again to E, which calls D again of course (it can call it, the conditions in the code are not considered) then I need to get back to the calling point before that, i.e. B, which calls C, which calls E...etc and unwind it all the way back like that... Once that is done i drop through to A calls C ..and so forth

    Note there is a depth limitation in the language which is 6 calls - This is also one of the things i can catch as errors if it is exceeded.

    As you can tell from my ramble I think Ive lost it on this one!

    Thanks for any advice, hope my post is clear enough.
    Last edited by rogster001; 08-05-2014 at 04:10 AM.
    Thought for the day:
    "Are you sure your sanity chip is fully screwed in sir?" (Kryten)
    FLTK: "The most fun you can have with your clothes on."

    Stroustrup:
    "If I had thought of it and had some marketing sense every computer and just about any gadget would have had a little 'C++ Inside' sticker on it'"

  2. #2
    Registered User
    Join Date
    Oct 2011
    Posts
    847
    If you use Graphviz (it is available for Windows, Mac OS X, and for Linux; most Linux distributions already include it in the standard repositories), you can use the very simple DOT file, say calls.dot:
    Code:
    digraph {
      A -> B;
      A -> C;
      A -> C;
      B -> C;
      B -> D;
      C -> E;
      C -> E;
      E -> D;
    }
    If you run dot -Tgif calls.dot > graph.gif , you get a pretty nice graph:
    Name:  calls3.gif
Views: 93
Size:  6.7 KB
    You can use a very simple syntax to add attributes to the nodes and/or edges. For example,
    Code:
      "A" [ shape=hexagon ];
    If the node names look like numbers, or contain spaces or special characters, put them in double quotes.

    I personally prefer the dot SVG format (dot -Tsvg) or a zoomable screen window (dot -Tx11 in Linux), but neato also produces nice output (neato -Tsvg or neato -Tx11).

    If you want to find all the call chains, i.e.
    Code:
    When function A is called:
        A B C E D
        A B D
        A C E D
    
    When function B is called:
        B C E D
        B D
    
    When function C is called:
        C E D
    
    When function D is called:
        D
    
    When function E is called:
        E D
    for your example dataset, you need to construct a data structure containing the calls for each function, and then traverse each possible path through the structures. For example:
    Code:
    #include <stdlib.h>
    #include <string.h>
    #include <errno.h>
    #include <stdio.h>
    
    #ifndef   CHAIN_MAX
    #define   CHAIN_MAX  64
    #endif
    
    typedef struct {
        char    *name;
        int      calls_max;
        int      calls;
        int     *call;
    } func;
    
    static int      functions_max = 0;
    static int      functions = 0;
    static func    *function = NULL;
    
    int add_call(const int func_index, const int call_index)
    {
        if (func_index < 0 || func_index >= functions ||
            call_index < 0 || call_index >= functions) {
            fprintf(stderr, "Invalid function index.\n");
            errno = EINVAL;
            return -1;
    
        } else
        if (func_index == call_index) {
            fprintf(stderr, "Function '%s' is recursive.\n", function[func_index].name);
            errno = EINVAL;
            return -1;
    
        } else {
            func *const f = function + func_index;
    
            if (f->calls >= f->calls_max) {
                const int max = (f->calls | 15) + 17;
                void     *tmp;
    
                tmp = realloc(f->call, (size_t)max * sizeof *(f->call));
                if (!tmp) {
                    fprintf(stderr, "Out of memory: too many calls in function '%s'.\n", f->name);
                    errno = ENOMEM;
                    return -1;
                }
    
                f->calls_max = max;
                f->call = tmp;
            }
    
            f->call[f->calls++] = call_index;
    
            return 0;
        }
    }
    
    int function_index(const char *const string, const int n)
    {
        int i;
    
        if (!string || n < 1) {
            fprintf(stderr, "Invalid (empty) function name.\n");
            errno = EINVAL;
            return -1;
        }
    
        for (i = 0; i < functions; i++)
            if (!strncmp(string, function[i].name, n))
                return i;
    
        if (functions >= functions_max) {
            const int max = (functions | 15) + 17;
            void     *tmp;
    
            tmp = realloc(function, (size_t)max * sizeof *function);
            if (!tmp) {
                fprintf(stderr, "Out of memory: too many functions.\n");
                errno = ENOMEM;
                return -1;
            }
    
            functions_max = max;
            function = tmp;
        }
    
        function[functions].name = malloc(n + 1);
        if (!function[functions].name) {
            fprintf(stderr, "Out of memory: function name too long.\n");
            errno = ENOMEM;
            return -1;
        }
    
        memcpy(function[functions].name, string, n);
        function[functions].name[n] = '\0';
        function[functions].calls_max = 0;
        function[functions].calls = 0;
        function[functions].call = NULL;
    
        return functions++;
    }
    
    int do_chain(const int function_index,
                 int *const chain, const int index, const int length,
                 int (*printfunc)(const int *const, const int))
    {
        if (index >= length) {
            fprintf(stderr, "Call chain is too deep.\n");
            return -1;
    
        } else
        if (function_index < 0 || function_index >= functions) {
            fprintf(stderr, "Invalid function index in call chain.\n");
            return -1;
    
        } else {
            func *const f = function + function_index;
            chain[index] = function_index;
    
            if (f->calls > 0) {
                int i, retval;
                for (i = 0; i < f->calls; i++)
                    if ((retval = do_chain(f->call[i], chain, index + 1, length, printfunc)))
                        return retval;
                return 0;
            } else
                return printfunc(chain, index + 1);
        }
    }
    
    int call_chains(const int function_index, 
                    int *const chain, const int length,
                    int (*printfunc)(const int *const, const int))
    {
        if (!chain || length < 2) {
            fprintf(stderr, "Invalid call chain buffer (or buffer length).\n");
            return -1;
        }
    
        if (function_index < 0 || function_index >= functions) {
            fprintf(stderr, "Invalid function index.\n");
            return -1;
        }
    
        return do_chain(function_index, chain, 0, length, printfunc);
    }
    
    int print_chain(const int *const chain, const int length)
    {
        if (length > 0) {
            int i;
            fputs(function[chain[0]].name, stdout);
            for (i = 1; i < length; i++) {
                fputc(' ', stdout);
                fputs(function[chain[i]].name, stdout);
            }
            fputc('\n', stdout);
        }
        return 0;
    }
    
    int main(void)
    {
        int     chain[CHAIN_MAX];
        char    buffer[1024], *line;
        int     start1, end1, start2, end2;
        int     caller, callee;
    
        while ((line = fgets(buffer, sizeof buffer, stdin))) {
            start1 = end1 = start2 = end2 = -1;
            (void)sscanf(line, " %n%*s%n %n%*s%n", &start1, &end1, &start2, &end2);
    
            if (end2 > start2 && end1 > start1 && start2 > end1) {
                caller = function_index(line + start1, end1 - start1);
                callee = function_index(line + start2, end2 - start2);
                if (caller < 0 || callee < 0)
                    return EXIT_FAILURE;
                if (add_call(caller, callee))
                    return EXIT_FAILURE;
    
            } else {
                line[strcspn(line, "\t\n\v\f\r ")] = '\0';
                if (line[0])
                    fprintf(stderr, "Skipping invalid line '%s'.\n", line);
            }
        }
    
        for (caller = 0; caller < functions; caller++)
            if (call_chains(caller, chain, CHAIN_MAX, print_chain))
                return EXIT_FAILURE;
    
        fprintf(stderr, "No errors.\n");
    
        return EXIT_SUCCESS;
    }
    The above program expects the function names, caller and callee, in standard input; each pair on each line. The implementation is rude and crude; only detects recursion when CHAIN_MAX call chain depth is exceeded, so you'll probably want to fine tune it quite a bit.

    This one just prints out each call chain, one chain per line, and all possible call chains, but you could easily modify it to instead save each (unique?) call chain to an array -- say keyed by the initial function -- and use that data structure in your GUI.

    The way I refer to the functions is pretty useful, IMHO: each unique function is indexed by an integer, 0 to functions-1, inclusive, so the call chain is just an array of integers. (Plus the call chain length.) (If two call chains have the same length, and memcmp(chain1, chain2, (size_t)len * sizeof (int)) is zero, then the call chains are identical; otherwise they differ).

    Questions?

  3. #3
    Registered User rogster001's Avatar
    Join Date
    Aug 2006
    Location
    Liverpool UK
    Posts
    1,425
    Wow - thank you for taking the time for that response, very detailed. I am actually quite surprised about the amount of code there in the example you gave, but I see you are handling all the validation also, thanks again, I need to review this and see about any questions. BTW I love that graph option.
    Thought for the day:
    "Are you sure your sanity chip is fully screwed in sir?" (Kryten)
    FLTK: "The most fun you can have with your clothes on."

    Stroustrup:
    "If I had thought of it and had some marketing sense every computer and just about any gadget would have had a little 'C++ Inside' sticker on it'"

  4. #4
    Registered User rogster001's Avatar
    Join Date
    Aug 2006
    Location
    Liverpool UK
    Posts
    1,425
    I am just waiting for the graph to finish generating, downloaded GraphViz, I am interested to see this.. there are over 9000 rows in the data.

    However, I do not feel it is the right visual for what I am trying to acheive - It would be a complementary one though.

    If I clarify slightly:
    In the sample dataset each of the function names in the index column are to be treated as distinct entry points, they are all the modules that exist in the body of code and the powers that be wish to choose any one and map its processing possibilites - There is a module called control which is actually the root/main one and may call any of the others. That one, Control, would show everything if selected as starting point.
    As I wait for this to generate i am wondering if GraphViz will represent that. I have just given it a whirl, I am not looking at the help or example stuff right now.
    Thought for the day:
    "Are you sure your sanity chip is fully screwed in sir?" (Kryten)
    FLTK: "The most fun you can have with your clothes on."

    Stroustrup:
    "If I had thought of it and had some marketing sense every computer and just about any gadget would have had a little 'C++ Inside' sticker on it'"

  5. #5
    Registered User rogster001's Avatar
    Join Date
    Aug 2006
    Location
    Liverpool UK
    Posts
    1,425
    Oh my word, it is like some kind of Fractal! It is very condensed and only a few labels can be seen, I need to get some zooming or something.

    The only problem with this is that If I wanted to isolate one function - and the routes that can occur as a result of using it - then the Graph tool does not work as I understand it right now - I need all of the other function names in my data, and no processing to take place when the root call is not the user selection, ie If i select function A then I need to see all the paths that might include function E as a call - But I dont want to see the paths generated from function E on its own, just as part of the cause and effect of A being the starting point, in terms of the visual output
    Thought for the day:
    "Are you sure your sanity chip is fully screwed in sir?" (Kryten)
    FLTK: "The most fun you can have with your clothes on."

    Stroustrup:
    "If I had thought of it and had some marketing sense every computer and just about any gadget would have had a little 'C++ Inside' sticker on it'"

  6. #6
    Registered User rogster001's Avatar
    Join Date
    Aug 2006
    Location
    Liverpool UK
    Posts
    1,425
    I feel this must be do-able with just a short recursion routine, I just have not been able to nail it, I have got close, but not complete. I tried various iterative ideas but It just keeps saying 'recursion' to me.

    Here's a part of the pic that Graphviz produced anyway, Its like something from the matrix, or war of the worlds

    Name:  matrix.PNG
Views: 82
Size:  48.0 KB
    Last edited by rogster001; 08-05-2014 at 10:33 AM.
    Thought for the day:
    "Are you sure your sanity chip is fully screwed in sir?" (Kryten)
    FLTK: "The most fun you can have with your clothes on."

    Stroustrup:
    "If I had thought of it and had some marketing sense every computer and just about any gadget would have had a little 'C++ Inside' sticker on it'"

  7. #7
    C++ Witch laserlight's Avatar
    Join Date
    Oct 2003
    Location
    Singapore
    Posts
    21,720
    Quote Originally Posted by rogster001
    I feel this must be do-able with just a short recursion routine, I just have not been able to nail it, I have got close, but not complete. I tried various iterative ideas but It just keeps saying 'recursion' to me.
    Since there is no recursion (and no mutual recursion too, I presume), you are essentially trying to list all paths in a directed acyclic graph. It turns out that there's a breadth of information on this online

    That said, a cursory glance through that breadth of information says that perhaps you should try your hand with a variant of depth first search before you start digging too deeply online. Basically, you find a leaf, list the path, backtrack, find another leaf, list the path, backtrack, if no leaves left on this branch, backtrack further up the tree, find another leaf, etc, until the entire graph has been explored.
    rogster001 and Sebastiani like this.
    C + C++ Compiler: MinGW port of GCC
    Version Control System: Bazaar

    Look up a C++ Reference and learn How To Ask Questions The Smart Way

  8. #8
    Registered User rogster001's Avatar
    Join Date
    Aug 2006
    Location
    Liverpool UK
    Posts
    1,425
    perhaps you should try your hand with a variant of depth first search
    Of course! Thats what has been nagging at me and I havent realised,- Its a path finding problem - with the goal as routes with no further nodes- Ive had tunnel vision on this.

    I am also wondering about how best to represent the output. If it were done like, dendrite style then I dont think that is as useful in terms of what is in the code, as it requires a further level of interpretation by the reader.
    Lets say: A, B, C is possible as is A. B, C, D, E. As is A,D,C,E
    Then if this was shown with branching from a node then you would have to be told to interpret that A can go direct to D, thats where the multiple lines in Graphviz come in though i suppose, you could colour them, have different shapes as suggested etc.

    I think a different diagram for Eg A to B and onwards, or A to C and onwards is my preference

    Thanks
    Last edited by rogster001; 08-05-2014 at 11:04 AM.
    Thought for the day:
    "Are you sure your sanity chip is fully screwed in sir?" (Kryten)
    FLTK: "The most fun you can have with your clothes on."

    Stroustrup:
    "If I had thought of it and had some marketing sense every computer and just about any gadget would have had a little 'C++ Inside' sticker on it'"

  9. #9
    Registered User rogster001's Avatar
    Join Date
    Aug 2006
    Location
    Liverpool UK
    Posts
    1,425

    output

    I see it more like this:

    Code:
    possible callee / call pairs	
      A -> B
      A -> C
      A -> C
      B -> C
      B -> D
      C -> E
      C -> E
      E -> D
      
    When function A is called
    
    	Calling (A) next thing that happens is:  B C E D
    	Return to callPos in (C) next thing that happens is E D
    	Return to callPos ...(B)  D
    	(A) C E D
    	(C) E D
    	(A) C E D
    	(C) E D
    Is that right, just for everything that happens after flow passes into A??
    Last edited by rogster001; 08-05-2014 at 12:18 PM.
    Thought for the day:
    "Are you sure your sanity chip is fully screwed in sir?" (Kryten)
    FLTK: "The most fun you can have with your clothes on."

    Stroustrup:
    "If I had thought of it and had some marketing sense every computer and just about any gadget would have had a little 'C++ Inside' sticker on it'"

  10. #10
    Registered User
    Join Date
    Oct 2011
    Posts
    847
    Apologies for not replying earlier; been busy elsewhere.

    Quote Originally Posted by rogster001 View Post
    there are over 9000 rows in the data.
    Ah, that makes a big difference. I once graphed all the links on a fairly large website, and even if printed on an A1 (in sections), it was an utter mess.

    In this case, I recommend splitting the output into per-function graphs, with only the initial callees listed.

    Quote Originally Posted by rogster001 View Post
    In the sample dataset each of the function names in the index column are to be treated as distinct entry points, they are all the modules that exist in the body of code and the powers that be wish to choose any one and map its processing possibilites - There is a module called control which is actually the root/main one and may call any of the others. That one, Control, would show everything if selected as starting point.
    Ah, so you also came to the same conclusion -- graph per function is the way to go. Also, dot is not the best option for this; better try neato, too.

    I rarely write C code for stuff like this; I prefer to use awk for data mangling.

    The following will only be useful to you if you have or install some Unix-like tools -- bash, awk and sort, in addition to dot and neato from Graphviz. Assuming you use Windows, you could use either Cygwin, or win-bash plus gawk and coreutils from GnuWin32. I don't know which are "best", since I don't use Windows myself.

    The following awk script, process.awk, takes one or more text files containing caller callee -formatted lines using any newline encoding, and generates one function.dot and one function.calls file per function. The latter is sorted by call chain depth, with the first field on each line being the length (number of functions) in the chain. Each possible unique chain is listed.

    Apologies for listing non-C sources in the C forum!

    process.awk:
    Code:
    #!/usr/bin/gawk -f
    BEGIN {
        # Accept any newline convention, and consume all leading
        # and trailing whitespace.
        RS = "[\t\v\f ]*[\r\n][\t\n\v\f\r ]*"
    
        # Any whitespace will do as a field separator.
        FS = "[\t\v\f ]+"
    
        # Multidimensional array separator is a pipe character.
        SUBSEP = "|"
    
        # Array of function names.
        fns = split("", fn)
    
        # 2D array of function calls: [caller][callee] = number of calls.
        split("", calls)
    
        # Array of functions called by other functions.
        split("", callee)
    
        # Depth of each function call chain.
        split("", depth)
    
        # URL prefix is "url"; -v url="prefix"
        # URL suffix is "ext"; -v ext="suffix"
    }
    
    (NF >= 2) {
        if ($1 != $2) {
            ++calls[$1][$2]
            ++callee[$2]
        } else
            printf "Ignoring recursive function '%s'.\n", $1
    
        if (!($1 in fn))
            fn[$1] = ++fns
    
        if (!($2 in fn))
            fn[$2] = ++fns
    }
    
    function chain(root, chainstr, funcname, n) {
        if (length(calls[funcname]) > 0) {
            if (maxdepth > 0 && n >= maxdepth)
                printf "%s exceeds maximum call chain length\n", root > "/dev/stderr"
            else {
                for (f in calls[funcname])
                    chain(root, chainstr " " f, f, n + 1)
            }
        } else {
            if (depth[root] < n)
                depth[root] = n
            printf "%d %s\n", n, chainstr > (root ".calls")
        }
    }
    
    END {
        printf "Have %.0f functions.\n", fns
    
        printf "Computing all function call chains:\n"
        for (f in fn)
            depth[f] = 1
        for (f in fn)
            chain(f, f, f, 1)
        printf "Sorting.. "
        for (f in fn)
            system("sort " f ".calls -rgb > X && mv -f X " f ".calls")
        printf "done.\n"
        
        printf "Generating .dot files:\n"
        for (f in fn) {
            dotfile = f ".dot"
            printf "\t%s:\n", dotfile
    
            printf "digraph {\n" > dotfile
            printf "\trankdir = \"TB\";\n" > dotfile
    
            # Center function is a box.
            printf "\t\"%s\" [ shape=box, label=\"%s\", fillcolor=\"#99ccff\", style=\"filled\", URL=\"%s\" ];\n", f, f, url f ext > dotfile
    
            # Functions called by the center function.
            if (f in calls)
                if (length(calls[f]) > 0)
                    for (c in calls[f]) {
                        printf "\t\"%s\" [ shape=oval, label=\"%s (%d)\", URL=\"%s\" ];\n", c, c, depth[c], url c ext > dotfile
                        printf "\t\"%s\" -> \"%s\" [ color=\"#6699CC\" ];\n", f, c > dotfile
                    }
    
            # Functions calling the center function:
            for (c in fn)
                if (c in calls) 
                    if (length(calls[c]) > 0)
                        if (f in calls[c]) {
                            printf "\t\"%s\" [ shape=oval, label=\"%s\", URL=\"%s\" ];\n", c, c, url c ext > dotfile
                            printf "\t\"%s\" -> \"%s\" [ color=\"#CC6666\" ];\n", c, f > dotfile
                        }
    
            printf "}\n" > dotfile
            close(dotfile)
        }
        printf "Done.\n"
    }
    Using the example data set, the above script generates ten files:
    Code:
    A.calls:
        5 A B C E D
        4 A C E D
        3 A B D
    
    B.calls:
        4 B C E D
        2 B D
    
    C.calls:
        3 C E D
    
    D.calls:
        1 D
    
    E.calls:
        2 E D
    
    A.dot:
        digraph {
        	rankdir = "TB";
        	"A" [ shape=box, label="A", fillcolor="#99ccff", style="filled", URL="A.html" ];
        	"B" [ shape=oval, label="B (4)", URL="B.html" ];
        	"A" -> "B" [ color="#6699CC" ];
        	"C" [ shape=oval, label="C (3)", URL="C.html" ];
        	"A" -> "C" [ color="#6699CC" ];
        }
    
    B.dot:
        digraph {
        	rankdir = "TB";
        	"B" [ shape=box, label="B", fillcolor="#99ccff", style="filled", URL="B.html" ];
        	"C" [ shape=oval, label="C (3)", URL="C.html" ];
        	"B" -> "C" [ color="#6699CC" ];
        	"D" [ shape=oval, label="D (1)", URL="D.html" ];
        	"B" -> "D" [ color="#6699CC" ];
        	"A" [ shape=oval, label="A", URL="A.html" ];
        	"A" -> "B" [ color="#CC6666" ];
        }
    
    C.dot:
        digraph {
        	rankdir = "TB";
        	"C" [ shape=box, label="C", fillcolor="#99ccff", style="filled", URL="C.html" ];
        	"E" [ shape=oval, label="E (2)", URL="E.html" ];
        	"C" -> "E" [ color="#6699CC" ];
        	"A" [ shape=oval, label="A", URL="A.html" ];
        	"A" -> "C" [ color="#CC6666" ];
        	"B" [ shape=oval, label="B", URL="B.html" ];
        	"B" -> "C" [ color="#CC6666" ];
        }
    
    D.dot:
        digraph {
        	rankdir = "TB";
        	"D" [ shape=box, label="D", fillcolor="#99ccff", style="filled", URL="D.html" ];
        	"B" [ shape=oval, label="B", URL="B.html" ];
        	"B" -> "D" [ color="#CC6666" ];
        	"E" [ shape=oval, label="E", URL="E.html" ];
        	"E" -> "D" [ color="#CC6666" ];
        }
    
    E.dot:
        digraph {
        	rankdir = "TB";
        	"E" [ shape=box, label="E", fillcolor="#99ccff", style="filled", URL="E.html" ];
        	"D" [ shape=oval, label="D (1)", URL="D.html" ];
        	"E" -> "D" [ color="#6699CC" ];
        	"C" [ shape=oval, label="C", URL="C.html" ];
        	"C" -> "E" [ color="#CC6666" ];
        }
    For example, the A.calls file tells us there are three unique call chains possible when you call function A: five-function long A→B→C→E→D, four-function long A→C→E→D, and three-function long A→B→D.

    By itself, the above script is not that useful. However, combine it with say a Bash script generate.bash:
    Code:
    #!/bin/bash
    # Use dot unless DOT is set.
    [ -n "$DOT" ] || DOT="dot"
    
    # Use maxdepth 8 unless MAXDEPTH is set.
    [ -n "$MAXDEPTH" ] || MAXDEPTH=8
    
    # Remove all generated files.
    rm -f *.dot *.calls *.html
    
    # Generate dot files.
    gawk -v ext=".html" -v url="" -v maxdepth=$MAXDEPTH -f process.awk "$@" || exit $?
    
    # Generate HTML files.
    for FILE in *.dot ; do
        FUNC="${FILE%.*}"
        HTML="${FILE%.*}.html"
    
        printf 'Generating %s\n' "$HTML"
    
        rm -f "$HTML"
        cat >"$HTML" <<END
    <!DOCTYPE html>
    <html>
     <head>
      <title> $FUNC() call graph </title>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <style type="text/css">
       p {
        padding: 0 0 0 0;
        border: 0 none;
        margin: 0.1em 0.5em 1.1em 0.5em;
        outline: 0 none;
       }
       p.index {
        text-align: center;
       }
       p.svg {
        text-align: center;
        padding: 0 0 0 0;
        border: 0 none;
        margin: 0 0 0 0;
        outline: 0 none;
       }
       svg {
        width: 50em;
        height: 30em;
        padding: 0 0 0 0;
        border: 0 none;
        margin: 0 0 0 0;
        outline: 0 none;
       }
      </style>
     </head>
     <body>
      <p class="index"><a href="index.html">Index</a></p>
    END
        gawk 'BEGIN { RS="[\t\v\f ]*[\r\n][\t\n\v\f\r ]*" ; FS="[\t\v\f ]+" }
              NF > 2 {
                  printf "<p class=\"calls\">%s", $2
                  for (i = 3; i <= NF; i++)
                      printf " &rarr; <a href=\"%s.html\">%s</a>", $i, $i
                  printf "</p>\n"
              }
             ' "$FUNC.calls" >> "$HTML"
    
        SVG="$($DOT -Tsvg "$FILE" | tr -s '\t\n\v\f\r ' '      ')"
        SVG="<svg viewBox=${SVG##* [Vv]iew[Bb]ox=}"
    
        printf '<p class="svg">%s</p>\n' "$SVG" >> "$HTML"
    
        printf '</body>\n</html>\n\n' >> "$HTML"
    done
    
    printf 'Generating index.html\n'
    cat > index.html <<END
    <!DOCTYPE html>
    <html>
     <head>
      <title> Function call graphs </title>
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <style type="text/css">
       p {
        padding: 0 0 0 0;
        border: 0 none;
        margin: 0.1em 0.5em 0.1em 0.5em;
        outline: 0 none;
       }
       p.index {
        text-align: center;
        margin: 0.1em 1.5em 0.1em 0.5em;
       }
      </style>
     </head>
     <body>
      <p class="index">
    END
    for FILE in *.calls ; do
        FUNC="${FILE%.*}"
        printf '   <br><a href="%s.html">%s</a>\n' "$FUNC" "$FUNC" >> index.html
    done
    printf '  </p>' >> index.html
    cat >> index.html <<END
     </body>
    </html>
    END
    If you run the Bash script file, supplying it with either the caller callee function name pairs on standard input, or the file name or names containing that data, and you get one HTML 5 file per function, with the call chains starting at that function, and an SVG graph of the immediate functions calling this function, and immediate functions this function calls.

    (I still don't know what you need, only what you've told us you intend to do or were wondering if you could do; the above is what I am guessing might satisfy the purpose behind all this. For example, it might be much better to show the complete call graph each function participates in.)

    All total, it's just over 200 lines of code, and did not take long at all to write. (The Bash stuff is cryptic, I admit: I write quite a bit of short bash scripts, and the initial versions of a script I write tend to become quite a mess. Usually I rewrite the script for readability before I send it onwards, but this time I didn't bother -- I do not think many of the readers of this thread will ever even try it out, so no reason to spend any extra time on making it readable.)

    You could certainly use awk or python or any scripting language to generate the HTML files; bash was just closest at hand for me. The next step, would be to rewrite both into clean versions, perhaps with a little templating for the HTML file generation, and to clean up the HTML files -- or at least add styles to make it not so darn ugly.

    After that, I'd include this into the actual code project, under a new Makefile target, so that the structure documentation could be built (into a separate directory) easily/automatically. (I assume you already have machinery in place to generate the function call chains from the original sources.)

    The above does not output a beautiful set of pages, but that's only because I'm lazy. With a little effort, you can definitely do professional-looking documentation; one trick I do is to take a pregenerated HTML file, then edit it until it looks nice, then backport the HTML and CSS changes back to the generator. Regenerating the HTML files usually turns up some corner cases, so I redo the ugliest/most problematic case. Seems to save a lot of time.

    Also, I personally would not write an application to explore program documentation or abstract data structures or call chains; I'd always try to generate it into HTML pages (or man pages for library functions).

    Browsers are dedicated applications for such stuff, so why reinvent the wheel?

    Questions?

  11. #11
    Registered User
    Join Date
    Oct 2011
    Posts
    847
    Just out of fun, I wrote another awk script, this time one that draws one .dot graph per function, with all function chains the function participates in. The example data produces the following five .dot files (A.dot through E.dot),
    Name:  example.gif
Views: 64
Size:  13.4 KB
    Note that neato produces different layouts; here, dot tries to keep the top-to-bottom orientation for the edges, if possible.

    Apologies for yet again including non-C example code in the C forum, but I guess the point here is to use tools that match best the task at hand. The corresponding C code is much more complex.

    Code:
    #!/usr/bin/awk -f
    BEGIN {
        # Accept any newline convention, and consume all leading
        # and trailing whitespace.
        RS = "[\t\v\f ]*[\r\n][\t\n\v\f\r ]*"
    
        # Any whitespace will do as a field separator.
        FS = "[\t\v\f ]+"
    
        # Multidimensional array separator is a pipe character.
        SUBSEP = "|"
    
        # Call chain separator is a comma.
        SEP = ","
    
        # Link URL prefix and suffix are "linkprefix" and "linksuffix".
    
        # Edge colors
        srcedge = fixcolor(srcedge, "6666CC")
        dstedge = fixcolor(dstedge, "CC6666")
        theedge = fixcolor(theedge, "333333")
    
        # Node colors
        srcnode = fixcolor(srcnode, "333399")
        dstnode = fixcolor(dstnode, "993333")
        thenode = fixcolor(thenode, "000000")
    
        # Default maximum depth to 256 calls.
        if (maxdepth < 1)
            maxdepth = 256
    
        # Call matrix: unique caller-callee pairs.
        pairs = 0
        # Call matrix: calls[caller][callee] = count
        split("", calls)
        # Call matrix: called[callee][caller] = count
        split("", called)
    
        # Array containing all caller functions.
        split("", caller)
    
        # Array containing all callees.
        split("", callee)
    
        # Array containing all function names.
        names = split("", name)
    
        # Hexadecimal conversion tables.
        for (i = 0; i <= 255; i++) {
            component[sprintf("%02x", i)] = i
            component[sprintf("%02X", i)] = i
        }
        for (i = 0; i <= 15; i++) {
            component[sprintf("%01x", i)] = i * 17
            component[sprintf("%01X", i)] = i * 17
        }
    
        printf "Reading data .. " > "/dev/stderr"
    }
    
    NF >= 2 {
        # Add caller to function names array, unless already added.
        if (!($1 in name))
            name[$1] = ++names
    
        # Add callee to function names array, unless already added.
        if (!($2 in name))
            name[$2] = ++names
    
        # Add caller to caller array.
        caller[$1]
    
        # Add callee to callee array.
        callee[$2]
    
        # Add this call to the call matrix.
        called[$2][$1]
        if (++calls[$1][$2] == 1)
            ++pairs
    }
    
    function fixcolor(rgb, defaultrgb) {
        sub(/[^A-Fa-f0-9]/, "", rgb)
        if (length(rgb) == 3 || length(rgb) == 6)
            return rgb
        return defaultrgb
    }
    
    function callchains(root, currchain, funcname, currdepth) {
        # Update maximum call chain depth for the root function.
        if (depth[root] < currdepth)
            depth[root] = currdepth
    
        if (funcname in caller) {
            # Funcname calls other functions.
            if (currdepth >= maxdepth) {
                printf "Function call chain starting at '%s' is too long!\n", root > "/dev/stderr"
                return
            } else
                for (f in calls[funcname])
                    callchains(root, currchain f SEP, f, currdepth + 1)
    
        } else {
            # Add this chain, unless already listed.
            if (!(currchain in chain))
                chain[currchain] = ++chains
        }
    }
    
    END {
        printf "%d functions, %d unique caller-callee pairs.\n", names, pairs > "/dev/stderr"
    
        # Call chain depths
        split("", depth)
        for (f in name)
            depth[f] = 1
    
        printf "Generating all call chains .. " > "/dev/stderr"
        chains = split("", chain)
        for (f in name)
            callchains(f, SEP f SEP, f, 1)
        printf "%d unique chains.\n", chains > "/dev/stderr"
    
        maxdepth = 0
        for (f in depth)
            if (depth[f] > maxdepth)
                maxdepth = depth[f]
        printf "Maximum call chain depth is %d.\n", maxdepth > "/dev/stderr"
    
        printf "Generating .dot files:\n" > "/dev/stderr"
        for (f in name) {
            dotfile = f ".dot"
    
            printf "digraph {\n", f > dotfile
            printf "\trankdir=\"TB\";\n" > dotfile
            printf "\troot=\"%s\";\n", f > dotfile
            printf "\tfixedsize=false;\n" > dotfile
            printf "\tmargin=0;\n" > dotfile
            printf "\t\"%s\" [ shape=box color=\"#%s\" URL=\"%s\" width=0 height=0 ];\n", 
                   f, thenode, (linkprefix f linksuffix) > dotfile
    
            # edges leading to current function
            split("", before)
            split("", isbefore)
            # edges following the current function
            split("", after)
            split("", isafter)
    
            # Find all edges in chains containing f
            for (c in chain)
                if (index(c, SEP f SEP) > 0) {
                    n = split(c, list, SEP)
    
                    for (i = 2; i < n - 1; i++)
                        if (list[i] == f)
                            break
                        else {
                            before[ list[i] ][ list[i+1] ]
                            isbefore[ list[i] ]
                        }
    
                    for (; i < n - 1; i++) {
                        after[ list[i] ][ list[i+1] ]
                        isafter[ list[i+1] ]
                    }
                }
    
            for (f1 in isbefore)
                printf "\t\"%s\" [ shape=plaintext color=\"#%s\" URL=\"%s\" width=0 height=0 ];\n",
                       f1, srcnode, (linkprefix f1 linksuffix) > dotfile
    
            for (f2 in isafter)
                printf "\t\"%s\" [ shape=plaintext color=\"#%s\" URL=\"%s\" width=0 height=0 ];\n",
                       f2, dstnode, (linkprefix f2 linksuffix) > dotfile
    
            for (f1 in before)
                for (f2 in before[f1])
                    printf "\t\"%s\" -> \"%s\" [ color=\"#%s\" ];\n", f1, f2, srcedge > dotfile
    
            for (f1 in after)
                for (f2 in after[f1])
                    printf "\t\"%s\" -> \"%s\" [ color=\"#%s\" ];\n", f1, f2, dstedge > dotfile
    
            printf "}\n" > dotfile
    
            close(dotfile)
            printf "\t%s\n", dotfile > "/dev/stderr"
        }
        printf "done.\n" > "/dev/stderr"
    }
    If you generate SVG files using
    Code:
    awk -v linksuffix=.svg -f script.awk data.file
    bash -c 'for F in *.dot ; do dot -Tsvg "$F" > "${F%.*}.svg" ; done'
    you can open the individual SVG files in a browser, and click on the function names to switch to the graph centering on that function.

    To see how the neato graphs differ, just use
    Code:
    bash -c 'for F in *.dot ; do neato -Tsvg "$F" > "${F%.*}.svg" ; done'
    to regenerate the SVG files.
    rogster001 likes this.

  12. #12
    Registered User rogster001's Avatar
    Join Date
    Aug 2006
    Location
    Liverpool UK
    Posts
    1,425
    wow - thanks, Just saw this response, ive been playing a bit in the graphViz too, thanks a lot for taking the time to provide all that work. Ill have a good read through and see whats what, - Your suggestion of Unix/bash/awk is actually perfect because that is the original environment that hosted the programming language in question - And a lot of the clients are still using it as the backend so I have access to dev environments where I can try out some of your ideas.

    I'll try and post something that more exactly describes the goal here - as you mention its not entirely clear so far.

    ps - i do a fair bit of scripting too so those parts of your replies are not lost on me, thanks again.
    Last edited by rogster001; 08-10-2014 at 04:30 AM.
    Thought for the day:
    "Are you sure your sanity chip is fully screwed in sir?" (Kryten)
    FLTK: "The most fun you can have with your clothes on."

    Stroustrup:
    "If I had thought of it and had some marketing sense every computer and just about any gadget would have had a little 'C++ Inside' sticker on it'"

  13. #13
    zub
    zub is offline
    Registered User zub's Avatar
    Join Date
    May 2014
    Location
    Russia
    Posts
    96
    Code:
    #include <stdio.h>
    
    
    const char* names[6] = {
        "A",
        "B",
        "C",
        "D",
        "E",
        "main"
    };
    
    
    unsigned int calls[5][5] = {
        { 0, 1, 2, 0, 0 },
        { 0, 0, 1, 1, 0 },
        { 0, 0, 0, 0, 2 },
        { 0, 0, 0, 0, 0 },
        { 0, 0, 0, 1, 0 }
    };
    
    
    void function(const unsigned int caller, const unsigned int i)
    {
        printf("%s -> %s\n", names[caller], names[i]);
        unsigned int j;
        for( j = 0; j < 5; ++j ) {
            if( calls[i][j] != 0 ) {
                calls[i][j] -= 1;
                function(i, j);
            }
        }
    }
    
    
    int main(void)
    {
        function(5, 0);
        return 0;
    }
    Our goals are clear, tasks are defined! Let's work, comrades! -- Nikita Khrushchev

  14. #14
    Registered User rogster001's Avatar
    Join Date
    Aug 2006
    Location
    Liverpool UK
    Posts
    1,425
    Thanks zub...That looks quite exciting.. I have not compiled/tested - I will tomorrow though, but on a quick glance the additional work appears to be data handling..

    @Nominal Animal - Still to go through your latest post fully but i did see this:

    Code:
    A.calls:
        5 A B C E D
        4 A C E D
        3 A B D
     
    B.calls:
        4 B C E D
        2 B D
     
    C.calls:
        3 C E D
     
    D.calls:
        1 D
     
    E.calls:
        2 E D
    It is symptomatic of the fact that i need to reclarify the objective, but as an interim point see below:

    The data for the actual work is retrieved from a query that will only return function names that actually contain a function call in their definition.

    Thus the function 'D' from the test data will never be indexable, and so the output line above: D Calls 1: D is not valid, as far as offering it in the output that is.

    That is exactly the kind of thing i am struggling with - Its that endpoint/output/return idea, is giving me difficulty.
    Last edited by rogster001; 08-10-2014 at 08:47 AM.
    Thought for the day:
    "Are you sure your sanity chip is fully screwed in sir?" (Kryten)
    FLTK: "The most fun you can have with your clothes on."

    Stroustrup:
    "If I had thought of it and had some marketing sense every computer and just about any gadget would have had a little 'C++ Inside' sticker on it'"

  15. #15
    Registered User
    Join Date
    Oct 2011
    Posts
    847
    Quote Originally Posted by rogster001 View Post
    Thus the function 'D' from the test data will never be indexable, and so the output line above: D Calls 1: D is not valid, as far as offering it in the output that is.
    No, you misunderstand the format of the .calls files.

    The example dataset I used for all of the above is
    Code:
    A B
    A C
    A C
    B C
    B D
    C E
    E D
    As you can see, D is only referenced as a target. The .calls files, generated by the awk script automatically from the above data, contain the tail parts of the call chains the function participates in. To allow for all files to be processed easily, the tails begin with the function itself: this way you can combine all the files into one, and not lose any information.

    Because D does not call any functions, D.calls only contains 1 D , as per the definition. Perhaps .tails would have made for a better file extension?

    Quote Originally Posted by rogster001 View Post
    That is exactly the kind of thing i am struggling with - Its that endpoint/output/return idea, is giving me difficulty.
    A simple list of all caller-callee function pairs is not sufficient to produce tracing-style output or graphs. You need more information than that.

    For example, if we can assume the caller-callee pairs are in the order they exist in the function, you could use the record node type to describe the target function box in the graphs shown in my post #11. The B.dot would become
    Code:
    digraph {
    	rankdir="TB";
    	root="B";
    	fixedsize=false;
    	margin=0;
    	"B" [ shape=record label="{<B>B|<B1>|<B2>}" color="#000000" URL="B.svg" width=0 height=0 ];
    	"A" [ shape=plaintext color="#333399" URL="A.svg" width=0 height=0 ];
    	"C" [ shape=plaintext color="#993333" URL="C.svg" width=0 height=0 ];
    	"D" [ shape=plaintext color="#993333" URL="D.svg" width=0 height=0 ];
    	"E" [ shape=plaintext color="#993333" URL="E.svg" width=0 height=0 ];
    	"A" -> "B" [ color="#6666CC" ];
    	"B":"B1" -> "C" [ color="#66CC66" ];
    	"B":"B2" -> "D" [ color="#66CC66" ];
    	"C" -> "E" [ color="#CC6666" ];
    	"E" -> "D" [ color="#CC6666" ];
    }
    and the corresponding B graph
    Name:  Bordered.gif
Views: 45
Size:  3.5 KB
    I also colored the immediate function calls green, to make it easier to see which calls the function itself makes.

    If you had a third column identifying each call (say, a line number), you could include that in the above graph trivially. Or, you could just put the target function name (callees, C and D) there, to make it easier to see the functions B calls at a glance.

    If your tools can produce even a rough AST -- conditionals, function calls, and return/exit/end-function conditions would suffice --, you could expand the boxes containing the function B above (for each function graph, of course), to a single box with a graph containing the structure of B. (Graphviz supports "clusters"; you can have an entire subgraph, or even a hierarchy of such subgraphs, as a node-like entity.)

    The identifiers (empty boxes within B in the above graph) could be the conditional expression used (for conditionals), and/or the relevant line number in the sources. (If the source code is available at a stable URL in the intranet, then you could link to those.)

    In this post, I'm assuming the purpose here is to make it easier to grok a complex code base with rather arbitrary limitations. I prefer the call graphs in #11 because it tells you the position of the function at hand within the overall flow. However, it does not tell anything about the function internals, which I guess are also important to you; therefore I'm exploring the possibility of opening that up better.

    Another thing to consider is that in many cases, functions are not all created equal. You have internal or helper functions, exported or global functions, and handlers. If each function was labeled such, you could designate them such in the graphs. And for example for handlers, you could trim the preceding call chain to not include any functions that explicitly call the handler -- such calls are rare, and their intent is to trigger the handler as if the runtime environment had done it, and therefore any call chain that ends up explicitly calling a handler is not really relevant to any call chain that originates from a handler. Or you could simply designate different function types with colors or node shapes.

    Again, it all depends on what your goal is, and what kind of tools you have at hand..

Page 1 of 2 12 LastLast
Popular pages Recent additions subscribe to a feed

Similar Threads

  1. Question about Arrays and program flow
    By David.Mary in forum C Programming
    Replies: 12
    Last Post: 06-27-2011, 05:23 PM
  2. Program flow
    By Ducky in forum C++ Programming
    Replies: 5
    Last Post: 02-11-2010, 11:04 AM
  3. Program flow
    By Ducky in forum C++ Programming
    Replies: 4
    Last Post: 04-20-2009, 08:27 AM
  4. Simple C Program with Flow Statements
    By rory-uk in forum C Programming
    Replies: 16
    Last Post: 02-06-2008, 12:12 PM
  5. Program flow
    By Ducky in forum C++ Programming
    Replies: 22
    Last Post: 01-09-2008, 10:18 AM

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21