• Question: How can I make a tag graph visualization?

    bsugar asked on January 25, 2018 19:49
    190 views | 2 answers | #15614


    Recently I have been asked a few questions about how the tag graph was implemented. So I figured it might be good to have a public space to answer them!

    Some preliminary questions:

    1. If 2 tags belong to same node , they have an edge between them?
    2. The different colors is for different types of node like questions , notes , research-notes , etc . ?

    Please add any questions of your own and I'll hop back on to answer!

    EDIT: I'll be posting more as soon as we can figure out where to place some files so they are publicly accessible.



    6 Comments

    <3<3<3

    Reply to this comment...


    1) If 2 tags belong to same node , they have an edge between them ?

    The tags don't belong to nodes, the nodes are actually the tags themselves. Each "tag node" as it were, has an edge between them when they occur on the same page on the plots website. I believe that goes for any page be it a research note or a wiki page. Take the following page (whoa meta) for example: Here you see the tags are:

    Screen_Shot_2018-01-25_at_2.54.08_PM.png

    The nodes are:

    website design tags

    The un-directed edges are:

    website <-> design website <-> tags design <-> tags

    2.) The different colors is for different types of node like questions , notes , research-notes , etc . ?

    The colors relate to the "community" the nodes (tags) belong to. Take a look at this image @cfastie posted:

    tagboolean.jpg

    Is this a question? Click here to post it to the Questions page.

    Reply to this comment...


    Follow up: @sagarpreet, in exploring export options for a different project, I just discovered that you can extract the visual attributes (i.e. color, size, positions) by exporting to one of the files that supports this (see matrix image here). Open the .gephi file, then go to File --> Export --> Graph File. When you choose a supported file format, the "Options" button should become "clickable". Click on it, and make sure to check off the boxes for any attributes you'd like.

    Personally, since I do have an interest in web visualizations, but I do not have an interest in figuring out how to implement algorithms for things like community detection and calculating node sizes, I think this is a great way to quickly translate the static visualizations from gephi into something dynamic. The other plus is that I'd prefer to see changes I made update in real time without having re-run the program again.

    Again, for all others, we'll get some files up real soon!

    Reply to this comment...


    @bsugar added some really excellent research on finding associated tags that we can incorporate into the API planning. Just copying in here to keep as a reference as this moves forward. I'll also link into the long issue where this has been worked on: https://github.com/publiclab/plots2/issues/1502

    ...I think the problem there would be that some tags belong to both categories. There are two pieces to the tag graph, the calculation of tag co-occurrence (and subsequent removal of co-occurrences below a given threshold), and then the community detection. You might be able to get away with just doing co-occurrences. That code would be easy enough to transfer from python to ruby to run every so often. Now that there is a web dev (hi by the way!) you could also just implement the javascript version of tagoverflow and replace the data.

    Later adding:

    Okay, so translating the co-occurrences to Ruby will require a bit more code then I thought it would be (see second paragraph). Unfortunately, I can't find any co-occurrence libraries in Ruby (you might have better luck).

    However, there are a bunch of recommendation engines available in Ruby which I guess makes more sense given Ruby's popularity as a back end for websites which inevitably include e-commerce. Reccomendify made sense to me right away. What you could do is instead of users purchasing products , you could treat each post as a user who "purchases" tags. There's also Recommendable but it was more opaque to me.

    If you want to use the method that was employed to make the graph above minus the making of the graph, you'd have to translate this code and run it in the Ruby version of these commands. You do not need the export2graphml function.

    Essentially, you want a file that looks like this. It won't be hard to calculate the counts of the tags individually, but trickier part is counting the co-occurrences. But once you have those figures, calculating observed to expected ratio is easy and detailed here

    Reply to this comment...


    @bsugar i had a quick question -- would it be OK to only collect /some/ of the related tags of each tag, i.e. limit the number of edges that we look up for each tag record? As of recently we have an optimized Tag.related(tagname) method that optimally returns the 5 tags most used with the given tag.

    In this code, I was able to collect the 5 most-related tags for each of the top 250 tags, and it runs in about 8-11 seconds on the production site:

    https://gist.github.com/jywarren/07f598cca34bdc2f8042236b83f02b10

    I'm wondering if we could just reformat that to be the correct JSON format and then hand it off to tagoverflow?

    Is this a question? Click here to post it to the Questions page.

    Reply to this comment...


    Sorry @warren! Looks like this may have been addressed in the github conversation. However, for those that come after, given the goals which are probably satisfied by an approximation, I don't see why I wouldn't suffice.

    I think the downside is the one that I mentioned in comment. The edge weights are created using something called the observed vs. expected odds ratio:

    oe_ratio = (all_questions_count * tag_count_AB) / (tag_count_A * tag_count_B)
    Pulling from the github comment

    This method is one way to take care of the issue where an edge or node node may be important but of low usage. For example, at a store 100 people might have a 85% probability of buying coffee and cream, but five of those people always purchase coffee, cream, and eggs. So I definitely want to keep 5 cartons of eggs in stock.

    So, will five work? I think so. Yes. But I think what will technically happen is that you won't always know to keep "the eggs" (specific tag) in "stock" (on the graph), as it were, since you've presumed that you only want the top five associated "products" (tags).

    Is this a question? Click here to post it to the Questions page.

    Reply to this comment...


    Log in to comment