Public Lab Research note


GSoC proposal: Mapknitter ORB Descriptor (w/ auto-stitching, pattern training, and live video support) and LDI revamp (major UI enhancements)

by rexagod | March 11, 2019 15:08 | 323 views | 33 comments | #18515 | 323 views | 33 comments | #18515 11 Mar 15:08

Read more: i.publiclab.org/n/18515


About me

Name : Pranshu Srivastava

Email : rexagod@gmail.com

Github : https://github.com/rexagod

Proposal: Includes MapKnitter UI and Microscope live stitching, auto-stitch in MapKnitter (magnetic attraction))

Student year: Sophomore (second-year student)

Field of study: Computer Science Engineering, B.Tech.

Affiliation : Jaypee Institute of Information Technology, India

Location : Noida, India

Project description

Abstract

Employing machine learning techniques to incorporate an ORB desciptor (w/ auto-stitching, live video support) in order to enable better pattern matching along with some major LDI UI (and API) enhancements.

Aims to solve

Both of the projects listed below are detailed descriptively in the implementations section

Oriented FAST and Rotated BRIEF (ORB) Descriptor)

  • Currently, the user has to manually "stitch" similar looking images in order to form a larger one. We aim to automate this cumbersome process in the following sequence.
    • First, using some of the advanced machine learning techniques, mostly "Oriented FAST and Rotated BRIEF", i.e., ORB detection, offered by the JSFeat (JavaScript Computer Vision) Library the algorithm will understand how similiar any two images passed to it are, showing the corresponding outputs (keypoints, matches, good matches, gaussian blur and grayscale intensity) for each pair to the user. Also, a "training process" will be initiated every x seconds to better recognize and discover even more keypoints in the image, leaving lesser room for failed attempts at each iteration.
    • Second, on the basis of the above details the user can choose to "stitch" the images which itself, will be an automated process, and the "auto-stitcher" ToolbarAction will proceed to automatically stitch them together (using the Auto-Stitcher module, detailed below).
    • Lastly, for better accuracy, the user themselves can intervene and make small adjustments if they wish to obtain precise results.
  • Public Lab will also showcase a live video ORB matcher, that learns from an image passed to it, and goes on to find that particular pattern in the live video feed! At each step, the most efficient params will be considered, so as to maintain the maximum render FPS possible. This can be applied to various areas of interest, for eg., a particular minute organism can easily be detected if the video feed from the microscope was supplied to this module and compared against an image of it (downloaded from the net, or uploaded from user's local), or a prominent object in space using the satellite's live feed, the possibilities are endless!

Mapknitter UI enhancements

Currently, the Leaflet.DistortableImage repository serves as the supporting image distortion library of the Mapknitter repository hence proving itself to be an essential part of it. We'll be implementing some major core changes as per the planning issue describes here, which will definitely prove beneficial to the user in terms of better accessibility and interactivity. Work regarding this already in progress and can be checked out here.

Implementation

Oriented FAST and Rotated BRIEF (ORB) Descriptor)

Dividing this into subparts,

a) Microscope live-tracking: We can modify the algorithm to adapt to just a single image and train only on that rather than depending on the live video feed to train itself, in short, this will allow the user to provide an image (pattern) as an input and the tracker will then attempt to find that pattern in the feed every x seconds in a loop (currently set to 2s). On finding appropriate matches the tracker will filter the number of "good matches" and if they are above a certain threshold, display "matched".

Screenshot from 2019-03-09 19-36-13

Live-tracking in action! (do let me know if the .gif is taking way too long to load)

Peek-2019-03-09-18-10.gif Few points of interest above that one should notice are that the no matter the orientation, ORB will recognize image pattern if they are even partially visible. Also, notice how it confirms a match case even when some portion of the coaster was cut off.

b) ORB Algorithm (currently in beta, click here to check it out!): We can enhance the current ORB algorithm in the following ways in order to better fit Public Lab's utilization. - [a simplified version of the] Currently proposed ORB structure, built on @warren's suggestions, is depicted below. Note that this is based off of the review here, and satisfies the queries in that particular comment. The actual implementation in all practicality, will be much more detailed, but still draws inspiration from the snippet below.

function ORBMatcher() {
  this.findPoints = findPoints;
  this.findMatchedPoints = findMatchedPoints;
  this.projectPointsInto = projectPointsInto;
  this.renderer = renderer;
  return this;
}

function findPoints(img) { //extracts [RGBA] info about every pixel in an image passed to it
  //img is an `Image()` instance
  var canvas = document.createElement("CANVAS");  //notice we aren't appending this anywhere (offscreen)
  //TC is a temporary canvas used to extract [RGBA] data for each pixel of image
  var TC = node.getContext("2d");
  TC.drawImage(img,0,0,img.width,img.height); //faster than putImageData
  canvas.parentElement.childElement(canvas); //remove canvas since we are done with canvas APIs
  //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  //imgData contains `0-255` "clamped" points in a Uint8ClampedArray (refer MDN)
  //for eg., if the image is of the resolution 640x480, this array will have 1228800
  //elements (307200x4), where all are "clamped" (rounded off) to have a value
  //between 0-255, and every set of 4 elements represents the R,G,B, and A values
  //for every single pixel, i.e., 307200 pixels
  var imgData = TC.getImageData(0, 0, primary_image.width, primary_image.height);
  //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  //finally, we can extract the data about every single pixel, which also removes
  //the need for having a `Stateful` module (saving space as well) since we
  //now already have everything we need
}

// construct correspondences for inliers
// var correspondences_obj;
// for (var i = 0; i < count; ++i) {
//  var m = matches[i]; //`match_t()` construct:176
//  var s_kp = screen_corners[m.screen_idx]; //`keypoint_t()` construct:175
//  var p_kp = pattern_corners[m.pattern_lev][m.pattern_idx];
//  pattern_xy[i] = { "x": p_kp.x, "y": p_kp.y }; //==>X (img1points)
//  screen_xy[i] = { "x": s_kp.x, "y": s_kp.y };  //==>Y (img2points)
// }
// correspondences_obj = {"X":pattern_xy,"Y":screen_xy};
function findMatchedPoints(X,Y) {
  var good_matches_locatorX=[];
  var good_matches_locatorY=[];
  var strong_inliers=0;
  var xk=0;
  var yk=0;
  //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  //assuming we have already constructed correspondences from imageData as
  //described above and X and Y hold the `pattern_idx`(image1points) and
  //`screen_idx`(image2points) of the matches array for every
  //pixel respectively for each pixel in { "x": p_kp.x, "y": p_kp.y }
  //and { "x": s_kp.x, "y": s_kp.y } formats for X and Y respectively (please refer to the image above)
  //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  //find respective homographic transforms for both of the models (X and Y)
  //run ransac homographic motion model helper on motion kernel to find
  //*only* the strong inliers (good matches) excluding away all the others
  //(works on static models as well)
  //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  var mm_kernel = new jsfeat.motion_model.homography2d();
  var homo3x3 = new jsfeat.matrix_t(3, 3, jsfeat.F32C1_t);
  var ransac_param = new jsfeat.ransac_params_t(num_model_points,reproj_threshold, 0.5, 0.99);
  var match_mask = new jsfeat.matrix_t(500, 1, jsfeat.U8C1_t)
  //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  if (jsfeat.motion_estimator.ransac //runs once for static models and iterates if motion is detected
   (ransac_param,
    mm_kernel,
    X,
    Y,
    strong_inliers,
    homo3x3,
    match_mask))
  {
     for (var i = 0; i < count; ++i) {
         if (match_mask.data[i]) {
           X[strong_inliers].x = X[i].x;
           X[strong_inliers].y = X[i].y;
           Y[strong_inliers].x = Y[i].x;
           Y[strong_inliers].y = Y[i].y;
           good_matches_locatorX[xk++]= X[i];
           console.log(X[strong_inliers]); //==>{x: 359, y: 48},i.e., a matched point in X space
           good_matches_locatorY[yk++]= Y[i];
           console.log(Y[strong_inliers]); //==>{x: 65, y: 309},i.e., the corresponing point in Y space
           strong_inliers++;
         }
     }
     mm_kernel.run(X, Y, homo3x3, strong_inliers); //run kernel directly with inliers only
  }
  var matched_points = {
    "good_matches_X": good_matches_locatorX,
    "good_matches_Y": good_matches_locatorY,
    "matches_count": strong_inliers
  };
  return matched_points; //store location of all good matches in both models (images), X and Y, as well as their count
}

function projectPointsInto(matched_points,canvas,match_thres_percentage=30,expected_good_matches=5) {
  var good_matches = matched_points.matches_count; //from above fn.
  if (num_matches) { //from `match_pattern` utility fn.
   if (good_matches >= num_matches * match_thres_percentage/100) //we have a match!
   renderer.lines(canvas, matches, num_matches);  //picks X/Y data from matches and num_matches which are GLOBALS
   if (good_matches >= expected_good_matches) //in case of a solid match when results are as we expected or even better
   renderer.box(canvas);
   //we can stop iterations here completely for performance issues
  }
}

var renderer = { //gets hoisted
  lines: function(canvas, matches, num_matches) {
   render_matches(canvas, matches, num_matches);
  },
  box: function(canvas) {
   render_pattern_shape(canvas);
  }
}

//declarations and calls

var matcher = new ORBMatcher();

var imageA = new Image();
imageA.src = "imageA.jpg";
var imageB = new Image();
imageB.src = "imageB.jpg";

var imageDataA = matcher.findPoints(imageA);
var imageDataB = matcher.findPoints(imageB);
//construct correspondences
var matcherData = matcher.findMatchedPoints(X_,Y_);
var canvas = querySelector("#deploy-canvas").getContext("2d");
matcher.projectPointsInto(matcherData,canvas);

mnpyr

  • Large Data files solution: For large images ranging from 10-15mb, we can downsample the image data matrix into a smaller one, while carefully preserving most of the inliers. JSFeat implements the Gaussian Pyramid method of downsampling a large image to a smaller one in it's pyrdown API, i.e., jsfeat.imgproc.pyrdown(source:matrix_t, dest:matrix_t);. Here we can set the destination matrix to as few rows and columns which will allow us to process things faster while not sacrificing the important inliers in the image, as is evident from the final (smallest) image in the image pyramid above.
  • Proposed change in the search algorithm: Instead of starting from the first row and column, I believe a better approach would be starting from the means, i.e, if the image is of the resoulution 640x480, we can start our search from the imgData's pixel info at row,col of [320][240] and moving out in a circular (vignette) fashion. This approach is based on the fact that majorly the images have meaningful data, or inlier clusters count maximum at the center, rather that at the borders. To stop the execution, refer to the projectPointsInto fn. in which we can do the same in case good_matches >= expected_good_matches, where expected_good_matches is a custom threshold, and can be adjusted to reduce performance issues as well.
  • Determining the "confidence" of each point match structure: The "confidence" for each point is currently being calculated in the match_pattern() util. fn. and can be returned to the getMatches fn. (refer snippet above). We might want to note that the "confidence" we are talking about here corresponds to the standards set by the match_threshold and ranges between 0-127. The measure of a point being above or below the match_threshold is currently a property of every single point, so if two points were, say, to be in a pair, both of their best_dist (confidence factors) would be the same.
  • Completely remove all the video instances since we won't be working with any video formats whatsoever inside the LDI UI.
  • Add support for training process to occur b/w images rather than a video and image.
  • Calibrations and adjustments below can be employed to maximize ORB coverage while reducing outliers at the same time. But what exactly are these and why do we need them?
    • Gaussian blurring: Reduces noise (outliers) by blurring the modified input img_u8(8-bit conversion). Screenshot from 2019-03-09 20-18-41
    • Grayscale method: Converting every pixel to a 0 (black) or 1 (white) value for way better performace and evaluation. Screenshot from 2019-03-09 20-31-14
    • Match threshold application: Specify matching standards (how high/low should the ORB expect the match instensity to be in order for it to be a good match). Screenshot from 2019-03-09 20-37-39

The idea is to separate the image into two parts; the background and foreground. Select initial threshold value, typically the mean 8-bit value of the original image. Divide the original image into two portions; Pixel values that are less than or equal to the threshold; background.

Based on this, we need to consider two more thresholds, which are available under the YAPE06 module, originally designed by CVLab, Switzerland.

  • Eigen threshold application: Specify how similar two pixels should be in order for the ORB to perceive them as "lying in the same cluster". Screenshot from 2019-03-09 23-58-04
  • Laplacian threshold application: Defines the rate of change of intensity of the pixels that should be perceived by the ORB as "noisy". Screenshot from 2019-03-09 23-55-57
  • Finally, we can modularize this technique into a custom PL module that accepts two images and runs the algorithm against them. Refer (d) to look over the current proposed steps we can take after this.

c) Auto-Stitcher module: The auto-stitcher module will accept two images, the two most recently clicked images (as of now) and then then proceed to "stitch" those together as per the instructions passed on to it. We can, as an initial implementation, add a functionality to this module that allows the user to attach an image in any of the four major directions, after which the user themselves can make little changes to those images if need be. Since it wasn't merged at the time of writing this, I didn't implement this using the "multiple-image" selection feature, but am in the favor of shifting from the "last two latest-clicked images" to selecting any two images as per user's convenience and passing those down to the Stitcher, thus improving the module's overall accessibility. An undo function can be easily implemented as well using the recent addTool API of the Leafet.DistortableImage which triggers and adds an undo button to the menu bar every time the user stitches two images, and is removed upon undoing the stitch action. We can display a soft popup here to notify the user of the stitch since alert would be a bit "obstructing". I also believe that we should implement the AS module until after we have successfully abstracted away our polylines one, then using that build the AS on top of it.

// an instance that "traces" the last two latest-clicked images
var Tracer = L.DistortableImage.Tracer = L.DistortableImage.Tracer || {};
initialize: ...
//previous image (constructing param being {}) is available in this scope
overlay.Tracer = Tracer;
Tracer = overlay._corners;
//current image that is clicked is available in this scope

carbon

autostitcher--road.gif

d) Integrating (b) and (c): We can incorporate the ORB and Stitcher modules to work together in real-time in the MK UI. Whenever the user selects two images for comparison, the images are first passed down to the ORB module that displays all the relevant information about that particular pair of images in a modal. After this the user can select whatever pair fits best and choose to "stitch" them together in a user-defined orientation (right, left, top, bottom) with respect to the original image using the Auto-Stitcher.

modal

That being said, I am well familiar with Leaflet's polylines and think that replacing JSFeat's inbuilt methods with Leaflet's (@warren's suggestion) shouldn't be a problem at all, since we already have the indices of all the good matches, which brings me to another very important aspect of this library, the {x,y} image space conversion to LatLng objects. This can be achieved by a Scale function, that takes in the {x,y} coordinates of the matched points, and encapsulates them to points. What I'm proposing here is, the extensive use of the following "scaling" formula, which can be used to convert the image space points to latitudes and longitutes, hence easily being able to pinpoint the desired location on the image using the Leaflet library, and thus being able to use those coordinates in a ton of different Leaflet APIs.

Scaling

Converting image coordinates to LatLng object properties: For the "scaled" width we can, x_image_to_lat = {|obj2.x-obj1.x|/image_width} * imagespace_x. Similarly, for the "scaled" height we can, y_image_to_lng = {|obj3.x-obj1.y|/image_height} * imagespace_y.

Mapknitter UI enhancements

This is currently under progress, and most of the points in the projects and compatibility section have been or are getting resolved over in the PRs which I've submitted till now. I do aim to incorporate the appropriate modifications to my remaining open PRs as per @warren's suggestions and get them merged speedily and the remaining ones as well. You can checkout the follow-ups on the original planning issue as well.

Also, based on @warren's review, I think we can actually encapsulate all the "matching functions" in a different module, while constructing the UI (AutoStitcher+Lines+Boxes) using a different one. This would also make the testing part more convenient, and improve the resuseability of the components as well.

Timeline

  • 27th May to 23rd June: Introduce and integrate two new PL modules (w/ blog posts on both dates)
    • Week 1 and 2: ORB and Auto-Stitcher modules (w/ FTOs on each weekend)
      • Starting off with the most exciting aspects of my proposal, the ORB and Auto-Stitcher modules will be implemented from scratch and solidified along the way (details above), while keeping in mind to make these effective yet simplistic, so that newer contributors who find this interesting can easily contribute to this. I will also be fully documenting both of these modules to make the code as readable and easily understandable as possible, especially the Oriented FAST and Rotated BRIEF (ORB) module. That being said, I, under the guidance of my mentor, will execute appropriate measures to refine these modules so that they perfectly fit in with Public Lab's coding standards, practices, and community expectations.
    • Week 3 and 4: Integrate modularized ORB+AS and Microscope web-based live demo (w/ PR reviews on each weekend)
      • Although I've proposed a way to integrate ORB+AS into LDI above, I am and will be open to newer and better ideas that might be suggested by my mentor during my GSoC period. As is discernible, this will be a more work-focused period, and throughout this duration I will work on a modularized integration in order to maintain the readability of the codebase.
      • The Microscope web-based demo will implement an abstract flavor of the ORB different from its native or LDI implementation. For this purpose, I'll be customizing the ORB almost from the bottom-up, to change its current behavior of depending on user to train itself whenever prompted from the video feed to completely discarding that approach and rather train itself from a specific image supplied to it, so that it could find the desired patterns in the live feed, as is demonstrated the .gif above. It may also be worth noting that this will be a PL standalone module, unless until suggested otherwise.
  • 24th June to 28th June: Evaluations
  • 29th June to 21st July: Testing and Debugging Phase (w/ blog posts on both dates)
    • Week 1 and 2: General Debugging at LDI's #87 (w/ FTOs on each weekend)
      • This period will consist of different approaches to resolve the issues (todos, lower priority, and leftovers) over at LDI #87. These bugs have been in the system for a while now, and need to eradicated as soon as possible to build a neat codebase to attract more contributions in the future and less dangling errors in the future.
    • Week 3: Unit Tests w/ Jasmine (w/ PR reviews on each weekend)
      • Initially proposing the opportunity to write extensive tests for the ORB, AS, and the Microscopic live-feed modules, but willing to write exhaustive unit tests for different potential LDI modular fragments, however my mentor sees fit.
  • 22nd July to 26th July: Evaluations
  • 27th July to 25th August: LDI UI revamp (w/ blog posts on both dates)
    • Week 1 and 2: LDI #126 (w/ FTOs on each weekend)
      • During this period I will be looking forward to completely solving the remaining planning issues, i.e., finishing off the projects section and implementing the assorted utility functions, a step closer to the next major release of LDI!
    • Week 3 and 4: Uncommon issues (w/ PR reviews on each weekend)
      • I will finish off my GSoC journey by working on some of the uncommon issues such as Cross-browser testing for the Toolbar module by generating heap snapshots across various combinations of OS and browser versions. I'm currently considering different ways to implement this but this method looks the most promising and solid. Other than this, I will work on implementing a couple of @sarahelson81/save-time-on-manual-cross-browser-testing-3772be756e68">unique approaches that'll further guarantee the end-to-end functionality of PL's components and will also be easier for the new comers to quickly participate in! Also, I'll also look over the menu placement and image deletion (and specific ordering) issues which have been showing unexpected behaviour for a while now.

Needs

Only the guidance of my mentor, everything else that I require is either online, or on my local.

Contributions

At the time of writing this down, my contributions have been as follows.

Screenshot from 2019-03-10 02-41-12

  • Pull Requests (53 total, 46 closed, 7 open): Navigate
  • Issues -- FTOs (17 issues): Navigate
  • Issues -- Overall (31 issues): Navigate
  • Comments (on 148 issues): Navigate
  • Review Requests (19 PRs): Navigate

Experience

Comfortable with: C, CPP, JS, SH.

Achievements

  • Mentored at KWoC organized by KOSS, IIT Kharagpur, accessible here (2018).
  • JS Core at GDG JIIT Noida (2019) and JIITODC Noida (2019).
  • Secured 1st rank at DevFest organized by GDG JIIT Noida (Google Developer Groups, now Developer Student Clubs) (2017).
  • Secured 1st rank at Website Designing Competition (WDC) organized by GDG JIIT Noida (2017).

Teamwork

  • Interned towards the end of my first year as a Front-end developer at Profformance Technologies Pvt. Ltd. for their product, Xane, a React chatbot for HR purposes. The team strength wasn't much, about 25 people or so, and after a couple of months, I adapted to a work-from-home option to manage it with my studies, while visiting occasionally to attend meetings and other events. Throughout the duration of my internship here, I was introduced to the corporate etiquettes and productive practices that helped me realize the importance of time and priority management. I employed these techniques into my daily lifestyle to stay motivated and focused towards my project, as is discernable here too.
  • I am a member of JavaScript core team at Google DSC JIIT. I've been teaching JS (Vanilla, React, and Node.js) to my peers for some time now which has helped me to stay updated with the latest API changes as well, while coordinating with rest of the team throughout.
  • I also mentor a lot of enthusiastic students about OSS through JODC, working collaboratively with a lot of local tech societies like DSC-BVP, I take workshops and talks to bridge the gap between newcomers and development and spread awareness for FOSS. I am a teaching assistant here as well.

Passion

Screenshot from 2019-03-10 02-37-11

To start this off, I cannot stress how joyous I am when I realize that my code is actually improving the very surroundings that we live in! It's really an amazing feeling to "give back" to nature in such a unique way, which has become my passion and should be everyone else's too! Having said that, what interested me most about Public Lab's projects was the substantial, caring, and progressive community, which actively helped me clear away my doubts and learn new and interesting things throughout my journey with Public Lab, which I very much plan to extend for the many years to come!

Audience

My project will directly help all users (students, researchers, etc.) working in all kinds of fields that relate to studying, examining and experimenting with maps in many different ways possible, as described in various sections above. Also, it will help them recognize patterns that might often be "overlooked" by the human eye and thus extract credible information from live-video feeds, ranging from microscopic experiments to satellite broadcasts to cost-effective visual sensors for the visually impaired.

The best thing about this is that all of it is open-sourced and free, so people from all over the world, no matter what their current social or financial status is, can use this to meet their needs, and even make it better!

Commitment

I do understand that this requires serious commitment, and since the day I joined Public Lab I've structured every single day to revolve around only two priorities, i.e., PL and college, actually in that order (I like it here!), which has enabled me to make about 185 contributions since mid-December! Also, I'd like to mention that I will, as I have done previously, continue to actively interact with the newer contributors and provide insight and any help that I can regarding their PRs and issues during (and after) my GSoC period. Hence, I firmly believe that I will deliver my assignments with commitment and promptness, given that I get selected!

Thank you!


30 Comments

This is really great, thank you @rexagod!!! A very powerful and interesting proposal.

I'm interested in how the process breaks down into specific functions that are re-usable -- what are the useful subcomponents of this project! Such as - could it have such sub functions as:

  • lib.getPoints(image)
  • lib.findPointMatches(pointsArray)
  • lib.findPosRelativeToImage(imgA, imgB)
  • lib.findPointsInImg(pointsArray, img)

These may not be exactly right, but trying to think like this to give us a set of tools that could be reconfigured in different ways, such as to run on video frames, to turn a video stream into a large image, to move Leaflet image overlays around, things like that.

There might also be listeners, like lib.onFindMatch(fn) which would then run function(matchedPoints) or something... all these would be great too because they'd be testable!

What do you think of breaking down the architecture a little bit like this? Thank you! This is off to a great start!

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Awesome start to the proposal πŸ˜ƒ !!!

Reply to this comment...


Hey @warren @sagarpreet! I'm really glad that you liked the proposal! πŸ˜ƒ

Sorry for the delay, the past couple of days have been very tedious due to college stuff! But that's all done with and I've compiled together a list of the functions/data structures used in this project below. @warren, should I add a proper documentation regarding the same in the proposal above, since I thought that I'll be doing that in the code itself in the first couple of weeks? But, I'd be happy to anyway! πŸ‘

Fn./Structure name Description
matrix_t Basic structure to store data in terms of rows/columns, or image width/height along with the respective signatures (usually jsfeat.U8_t | jsfeat.C1_t -- single channel unsigned char)
match_t Records learned variables (screen index, pattern index, pattern level, distance) in a structure
demo_opt Controls user-defined options along with critical training
train_pattern Maintains original aspect ratio, apply grayscale and resample, prepare preview canvas with half (if not overridden) the original dimensions, preallocate corners and descriptors, detect keypoints in a loop as defined by num_train_levels, and finally store their coordinates after adjusting respective to the preview canvas size (determined earlier)
demo_app Perform final steps such as line colors, stroke styles, number of descriptors to define, and displaying info through stat object
tick Major rendering fn. (applies filters, renders patterns and matches back to the canvas, and supply info to the #log element)

Rest are util fns. that majorly make use of the JSFeat APIs to perform primitive objectives as is indicated by their definitions. These are,

  • detect_keypoints: Uses yape06.detect method to perform detections
  • find_transform: Calculate line geometry between keypoints in both canvases and re-estimate the same if motion change is detected in the canvas
  • match_pattern: Performs naive brute-force matching. Each on screen point is compared to all pattern points to find the closest match.
  • render_matches: Draws a match_mask(connecting lines) depending on the params passed down to it.
  • render_pattern_shape: Draws a box around the most probable area, with the highest number of keypoints.
        ctx.lineTo(shape_pts[1].x, shape_pts[1].y);
            ctx.lineTo(shape_pts[2].x, shape_pts[2].y);
            ctx.lineTo(shape_pts[3].x, shape_pts[3].y);
            ctx.lineTo(shape_pts[0].x, shape_pts[0].y);

Also, I do believe that it'll surely be a good practice to abstract off as much methods as we can which will definitely help us in the future with custom functionalities and testing requirements! Thanks!

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


@warren I'm also currently working on an unorthodox technique that'll help us soon incorporate specific image ordering functionality.

ordering

I'm taking this over to GitHub now, and will soon share updates and discuss my approach regarding the same there! Thanks!

Reply to this comment...


Just going through this and it'll take me a little time, but thank you! As to copying it into the proposal, that's OK, we can see it all in the comments. You're right that it'd eventually hopefully make it into the README and such, so no big deal right now. Thanks!

Reply to this comment...


OK, hi @rexagod - just skimmed this but actually I think there may need to be a lot more separation between the UI (say, drawing boxes, lines, things like that) and the underlying code that drives it. My question about getPoints, findPointMatches and the other methods was to look at what functions would be useful for general-purpose uses, without yet mixing in any display code. I want to know how you'd pass in 2 images, and how we'd format the identified points that were handed back out? How we'd "save" a point (in what format) that we want to reference again later? And how to (most importantly) return the location of a point in Image B but using the coordinate space of Image A. (i.e. what point in Image A does a given matched point in Image B correspond to?)

Finally, maybe as important as that last, can we return the four corner points of Image B in Image A's coordinate space? So that we can position Image B relative to Image A?

Does this make sense? These don't yet get into how the info would be displayed, and I think it's important to develop a strong set of function descriptors before actually writing any display code!

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


And... THANK YOU!

Reply to this comment...


@rexagod this is really cool! Really excited to start to see some of these ideas implemented and feel free to make suggestions on my multiple image select PR if you see early opportunities for structuring the code in a way that would help you implement this in the future.

Reply to this comment...


Hey @sashadev-sky! I'm glad you like it! πŸ˜ƒ Apologies for not being able to chime into the multiple image PR, though I've been meaning to for a while now, and definitely will! I've been nonetheless following up on it and it looks very promising! πŸ‘

@warren Please refer the text/images below and do let me know if there's anything else I can help with!

pixels

If we assume the canvas to be say, 640x480, the ORB will run against 307200 pixels (stored in matches array above) against every pixel in the original pattern. Visualize this as the bigger canvas (let's call it screen canvas) having it's bottom left corner at the origin with the measurable units on both axes being multiples of a pixel. Similarly let's call the smaller (inset) canvas pattern canvas. That being said, let's get down to the questions.

I want to know how you'd pass in 2 images, and how we'd format the identified points that were handed back out?

The pattern_xy (matched points in smaller canvas) and screen_xy(corresponding matched points in larger canvas) arrays are globals and thus can be accessed after they have been assigned values from wherever we'd want to.

How we'd "save" a point (in what format) that we want to reference again later?

Similar to above, the points are already saved in the two arrays which can be traversed in a level-order fashion to get all the identified points in the canvas. As far as the format for storing these points is concerned, pattern_xy and screen_xy are object arrays that store the xy coordinates of the identified points. Please refer to the snippet below.

// construct correspondences
            for (var i = 0; i < count; ++i) {
                var m = matches[i];
                var s_kp = screen_corners[m.screen_idx];
                var p_kp = pattern_corners[m.pattern_lev][m.pattern_idx];
                // identified points in the smaller canvas
                pattern_xy[i] = { "x": p_kp.x, "y": p_kp.y };
               // identified points in the larger canvas
                screen_xy[i] = { "x": s_kp.x, "y": s_kp.y };
            }

And how to (most importantly) return the location of a point in Image B but using the coordinate space of Image A. (i.e. what point in Image A does a given matched point in Image B correspond to)?

The xy sets on the left represent matched points in the pattern canvas pattern_xy and those on the right represent the corresponding matched points in the screen canvas screen_xy.

Screenshot from 2019-03-16 21-21-07

If any pixel (on the whole canvas) has a good match data, we send that into those arrays.

if (match_mask.data[i]) {
                        pattern_xy[good_cnt].x = pattern_xy[i].x;
                        pattern_xy[good_cnt].y = pattern_xy[i].y;
                        screen_xy[good_cnt].x = screen_xy[i].x;
                        screen_xy[good_cnt].y = screen_xy[i].y;
                        good_cnt++;
}

Finally adjust the corresponding "scale" for the main canvas.

 // fix the coordinates due to scale level
                    for (i = 0; i < corners_num; ++i) {
                        lev_corners[i].x *= 1. / sc;
                        lev_corners[i].y *= 1. / sc;
                    }

Finally, maybe as important as that last, can we return the four corner points of Image B in Image A's coordinate space? So that we can position Image B relative to Image A?

The shape_pts array specifically stores the coordinates for the given purpose derived from the homo3x3.data, a homography pin-point helper that serves the sole purpose of what its name suggests, and is rendered out as below.

var shape_pts = tCorners(homo3x3.data, pattern_preview.cols * 2, pattern_preview.rows * 2);
            ctx.moveTo(shape_pts[0].x, shape_pts[0].y);
            ctx.lineTo(shape_pts[1].x, shape_pts[1].y);
            ctx.lineTo(shape_pts[2].x, shape_pts[2].y);
            ctx.lineTo(shape_pts[3].x, shape_pts[3].y);
            ctx.lineTo(shape_pts[0].x, shape_pts[0].y);

We can also adjust the scale in the larger canvas (as above) to display the "matched" box in the smaller one too.

Thank you!!

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Well detailed and well structured proposal. I really appreciate the depth in which you have discussed the things. Few suggestions We are looking forwards for supportive folks. Make sure to do 1. at least one PR review each week and

make some FTO issues in each week to involve new contributors inside your projects. People will love to be part of big projects. It is a great feeling to help others.

Please take out a day from timeline from each month at end of each phase of SoC fellowship to write about what you learnt and did in the period. Earlier it was not compulsory. But let's make it compulsory. By this mentors will be able to access your progress. Don't forget to mention the ftos which you created during this period. Also mention the PR reviews which you did.

We are planning for video calls I'm each month during SoC program. So, it will be great to see you all.

It will be great if you can review each others proposals and give some suggestions.

Also, it will be great if you want to (optional) to hold a conference at your university and explain Public Lab. What we do here? Programs and our workflow. This will be great help to public lab.

These are small activities which can lead to better and supportive community at PL. All participants all programs are encourage to write these in their proposals.

Thanks and best of luck.

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Hey @bansal_sidharth2996! Thank you so much for responding to my proposal. I will surely see to it that these suggestions are followed in the future just as I've done them in the past.

Please take out a day from timeline from each month at end of each phase of SoC fellowship to write about what you learnt and did in the period. Earlier it was not compulsory. But let's make it compulsory. By this mentors will be able to access your progress.

Had this in mind for a while (I've been delaying the idea of having a blog for some time now and this would be a great start! πŸ˜ƒ ). I'm currently considering between a more "exposed" blogging system like Medium or something a bit more "abstract" like Dan's overreacted.io, or maybe I should stick to publishing my monthly progress directly to publiclab.org? What do you think?

Don't forget to mention the ftos which you created during this period. Also mention the PR reviews which you did.

A bit unsure about what period (the current or the SoC one) are you referencing to, and do I mention this in my monthly SoC post or my proposal? My FTOs and PRs (w/ reviews) till now are listed in the "Contributions" section above. Also, as far as the SoC period is corcerned, as is mentioned above, I'll absolutely continue to open up interesting FTOs and invite new comers to PL projects as much as possible!

We are planning for video calls I'm each month during SoC program. So, it will be great to see you all. It will be great if you can review each others proposals and give some suggestions.

Looking forward to it! I'll check on other's proposals too and see if I can help them in any way possible!

Also, it will be great if you want to (optional) to hold a conference at your university and explain Public Lab. What we do here? Programs and our workflow. This will be great help to public lab.

I'd definitely like to organize something like this ASAP at my uni as well! As a matter of fact, I always wanted to do something like this from the very moment I read this post and the amazing response it generated. We do organize meetups and such, and other than that I'm a technical assistant at the forementioned tech societies so organizing something like this shouldn't be cumbersome at all.

All participants all programs are encourage to write these in their proposals.

Write what out exactly? Sorry, but did I miss on including something in my proposal? I am unable to figure this out, can you please clarify?

Thanks again! Really appreciate it! πŸ‘

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


maybe I should stick to publishing my monthly progress directly to publiclab.org?

Blog post should be posted on Publiclab.org

what period (the current or the SoC one)

Blog work has to be done after you are selected in the GSoC at starting and end of each coding/working period. Does this makes sense?

did I miss on including something in my proposal?

Add some time for blogs, pr reviews and ftos in your timeline(about 3-4 days). Thanks

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Hi @rexagod, this looks really good...the way that you explained things is just awesome!!! πŸ‘ πŸ˜ƒ

Reply to this comment...


@icode365 Thanks! I'm glad you liked it!! πŸ˜ƒ

@bansal_sidharth2996 Thank you so much for the review! I've made the requested changes above and duly noted your suggestions! Can you please have a look? Much appreciated! πŸ‘

@warren I've tried to answer your query in my comment (below sashadev-sky's comment) above. Can you please check it out and provide your feedback on the same? Many thanks!

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


@rexagod amazing proposal. Completely love how you have tackled the whole thing in a great flow! Congratulations!

Reply to this comment...


@rexagod Thanks for sharing your interesting proposal. I have a question not related to the code but example pictures/gifs described above. It looks like your stitching the same picture? If this is this case, shouldn't it be a null result? Also, the road is aligned but portions of the road (below the intersection) are not matched? Should the non-match areas also determine when images are /are not stitched? Great coding BTW. Thank You

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Hey there @MaggPi!

That gif only shows the working of the AS module, and none of the images were actually wired to/utilized any ORB algorithm in that particular example. Let me explain.

The gif in question makes use of two images to demostrate only the automatic stitching ability of the AS module. In that particular example, I've explicitly passed the coordinates where the second image is supposed to be stitched to the first one. Of course we can do that using ORB, but for now I guess, the ORB isn't that advanced and may give some pretty messed up/unwanted results every once in a while, hence completely depending on it might not be the a good solution for now, so, as is visible in section (d), I'm thinking of displaying a modal that shows how good both the images compare, and leaving to the user if they want to stitch it in a left/right/top/down manner, and then automate AS to take up those coordinates and stitch the images in the desired orientation. Does that make sense?

Also, thank you so much for showing your interest in my proposal! Appreciate it! πŸ˜ƒ

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


OK, thanks @rexagod. Building on my questions in https://publiclab.org/notes/rexagod/03-11-2019/gsoc-proposal-mapknitter-orb-descriptor-w-auto-stitching-pattern-training-and-live-video-support-and-ldi-revamp-major-ui-enhancements#c22147, I think we may want to do a few things -- one, I think some of the variable names could be made more human readable -- Let's imagine our ideal abstract library we'll call matcher and think of what functions it will have:

var matcher = new OrbMatcher();

// these would output coordinates in the space of 
var points1 = matcher.findPoints(img1);
var points2 = matcher.findPoints(img2);

// this would need clear examples and docs for the format returned, and what coordinate space they're in.
// also, assuming it's stateless, would we need to pass in the original images too, so that we have extra context about the original images, or is there enough information in `points1` and `points2`?
var matches = matcher.findMatchedPoints(points1, points2);

// Alternatively, we could decide that this library should be stateful, and should store the input images:

var matcher = new StatefulMatcher();

matcher.addImages(img1, img2);

// not sure about this, but maybe:
var points0 = matcher.getPoints(0);
var points1 = matcher.getPoints(1);

// In either case, we could then use:
matcher.projectPointsInto(points0, targetCoordSpaceImg); // pass in the image whose coord space we want to use

// or, how would we do something like this?:
matcher.projectPointsIntoCanvsSpace(points0); 

// for some of the display functions the underlying library offers, like the render_pattern_matches you mentioned above, we might separate that as much as we can into a render layer of the library:
matcher.renderer.someFunctionName()

// on the other hand, if we may be displaying lots of the UI in Leaflet, we don't really want to write such specific code into the library, so maybe we'd just want to make it possible to fetch the coordinates out which allow us to use Leaflet to draw such UI elements as boxes, lines, etc?

Note that most of the initial methods above are in image x,y space, not canvas space; they are abstracted to be only in the coordinate spaces of the 2 input images. So, perhaps we would think of conversion to canvas space as a separate set of methods in the display code?

This essentially shifts the mental model from points in pattern_xy and canvas_xy to points in img1Points and img2Points. And the data returned via the matcher library's public API, instead of an canvas_idx index, would be in the coordinate space of img1. By default, we probably wouldn't even show the canvas onscreen.

The idea of abstracting the UI code -- esp. if it's written in Leaflet UI code -- might need a separate interfacing library, that builds on the fundamental features in the matcher, but is called something like LeafletImageStitcher (riffing on the name you proposed in your code snippet above) -- that you could pass a whole Leaflet.DistortableImageOverlay object to, and exposing methods like LeafletImageMatcher.matchAndAlignImageToImage(distortableImg1, distortableImg1); (a bit wordy, but see how this would be a very Leaflet-oriented library, as opposed to the underlying Matcher library which would be usable for things apart from Leaflet).

Actually, before trying to implement an auto-stitcher, I think it's totally worthwhile to attempt the more minimal "draw lines between matched points" UI. This would be a simpler overall system, and could be built upon to achieve auto-stitching. It would also help to keep the complexity of the full auto-stitch function contained, rather than premising the design of the whole library around the final end-to-end auto-stitch.

One example of this is that the Leaflet UI probably would not make use of the Canvas API to draw boxes, lines, etc -- Leaflet has its own polyline API. So, that's a good reason to abstract away all the render methods so that in a Canvas environment (like microscope, say) we can use those, but in the Leaflet environment, we'd use Leaflet polylines, most likely. And in Leaflet, we need to think in display coordinate space, not canvas coordinate space, so the question of what our "native" coordinate space is is really a question for the renderer module, not the underlying matcher library. What do you think of this?

Just a note -- what about the "confidence" of ORB match? does that mean we need the ORB match to be an object with properties, like matches = {confidence: 56, pairs: [ {x1: 252, y1: 61, x2: 116, y2: 6} ]}, for example? (or would confidence be a property of each pair?) That might mean we could return this from matcher.getMatches(), you know?

One more thing - thinking of speed and optimization, esp for gigantic images of 5-10mb -- in the match generator function, you note that it's brute force. Is there any way to randomize the order it searches, and to stop searching once it's found a certain number of high-quality matches? We could default to returning all matches, but being able to request just a few may be useful for some applications where speed is a factor. Similarly, we may want to eventually make it possible to downscale the images as an initial step where we're faced with extremely large input images. But we could just leave a comment inline in the code (and an open issue) to remind ourselves, as this is an optimization, not a core function, so we could think about it later.

For the UI, what do you think about a Stitch + undo workflow where the lib can be made to auto-stitch one image against another, but there's an 'undo' function generated which can be bound to an Undo UI button? This could be displayed in a notice above, perhaps, where it says "Keep this stitch?" or something.

Thanks for your detailed responses @rexagod. I know you have the underlying systems figured out well. Most of my input here is attempting to develop a highly readable and universal vocabulary of public methods that will make this library quick to learn, and to preserve conceptual abstractions which will allow us to use it in very flexible environments (like canvas vs. leaflet). I hope this input makes sense and apologies if I've missed a few things from your writing above... i'm trying to get through a lot of proposals this week and have to keep catching up, so please forgive me!

Again, great work, thanks!!!

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


And @maggpi - this kind of abstraction described above would be totally useful for your own proposals as well -- I'll leave related comments on your proposal!

Reply to this comment...


Oh! And also such modularity will let us test subparts of the lib independently!

Reply to this comment...


Hello @warren! Here's my response regarding the code section in your comment above. I've implemented these keeping the image space in mind, rather than the canvas one.

Location

function ORBMatcher() {
  this.findPoints = findPoints;
  this.findMatchedPoints = findMatchedPoints;
  this.projectPointsInto = projectPointsInto;
  this.renderer = renderer;
  return this;
}

function findPoints(img) { //extracts [RGBA] info about every pixel in an image passed to it
  //img is an `Image()` instance
    var canvas = document.createElement("CANVAS");  //notice we aren't appending this anywhere (offscreen)
    //TC is a temporary canvas used to extract [RGBA] data for each pixel of image
    var TC = node.getContext("2d");
    TC.drawImage(img,0,0,img.width,img.height); //faster than putImageData
    canvas.parentElement.childElement(canvas); //remove canvas since we are done with canvas APIs
    //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    //imgData contains `0-255` "clamped" points in a Uint8ClampedArray (refer MDN)
    //for eg., if the image is of the resolution 640x480, this array will have 1228800
    //elements (307200x4), where all are "clamped" (rounded off) to have a value
    //between 0-255, and every set of 4 elements represents the R,G,B, and A values
    //for every single pixel, i.e., 307200 pixels
    var imgData = TC.getImageData(0, 0, primary_image.width, primary_image.height);
    //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    //finally, we can extract the data about every single pixel, which also removes
    //the need for having a `Stateful` module (saving space as well) since we
    //now already have everything we need
}

// construct correspondences for inliers
// var correspondences_obj;
// for (var i = 0; i < count; ++i) {
//  var m = matches[i]; //`match_t()` construct:176
//  var s_kp = screen_corners[m.screen_idx]; //`keypoint_t()` construct:175
//  var p_kp = pattern_corners[m.pattern_lev][m.pattern_idx];
//  pattern_xy[i] = { "x": p_kp.x, "y": p_kp.y }; //==>X (img1points)
//  screen_xy[i] = { "x": s_kp.x, "y": s_kp.y };  //==>Y (img2points)
// }
// correspondences_obj = {"X":pattern_xy,"Y":screen_xy};
function findMatchedPoints(X,Y) {
    var good_matches_locatorX=[];
    var good_matches_locatorY=[];
    var strong_inliers=0;
    var xk=0;
    var yk=0;
    //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    //assuming we have already constructed correspondences from imageData as
    //described above and X and Y hold the `pattern_idx`(image1points) and
    //`screen_idx`(image2points) of the matches array for every
    //pixel respectively for each pixel in { "x": p_kp.x, "y": p_kp.y }
    //and { "x": s_kp.x, "y": s_kp.y } formats for X and Y respectively (please refer to the image above)
    //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    //find respective homographic transforms for both of the models (X and Y)
    //run ransac homographic motion model helper on motion kernel to find
    //*only* the strong inliers (good matches) excluding away all the others
    //(works on static models as well)
    //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  var mm_kernel = new jsfeat.motion_model.homography2d();
    var homo3x3 = new jsfeat.matrix_t(3, 3, jsfeat.F32C1_t);
    var ransac_param = new jsfeat.ransac_params_t(num_model_points,reproj_threshold, 0.5, 0.99);
    var match_mask = new jsfeat.matrix_t(500, 1, jsfeat.U8C1_t)
    //~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    if (jsfeat.motion_estimator.ransac //runs once for static models and iterates if motion is detected
     (ransac_param,
        mm_kernel,
        X,
        Y,
        strong_inliers,
        homo3x3,
        match_mask))
    {
     for (var i = 0; i < count; ++i) {
         if (match_mask.data[i]) {
           X[strong_inliers].x = X[i].x;
           X[strong_inliers].y = X[i].y;
           Y[strong_inliers].x = Y[i].x;
           Y[strong_inliers].y = Y[i].y;
           good_matches_locatorX[xk++]= X[i];
                     console.log(X[strong_inliers]); //==>{x: 359, y: 48},i.e., a matched point in X space
                     good_matches_locatorY[yk++]= Y[i];
                     console.log(Y[strong_inliers]); //==>{x: 65, y: 309},i.e., the corresponing point in Y space
                     strong_inliers++;
         }
     }
     mm_kernel.run(X, Y, homo3x3, strong_inliers); //run kernel directly with inliers only
  }
    var matched_points = {
        "good_matches_X": good_matches_locatorX,
        "good_matches_Y": good_matches_locatorY,
        "matches_count": strong_inliers
    };
    return matched_points; //store location of all good matches in both models (images), X and Y, as well as their count
}

function projectPointsInto(matched_points,canvas,match_thres_percentage=30,expected_good_matches=5) {
    var good_matches = matched_points.matches_count; //from above fn.
    if (num_matches) { //from `match_pattern` utility fn.
   if (good_matches >= num_matches * match_thres_percentage/100) //we have a match!
   renderer.lines(canvas, matches, num_matches);  //picks X/Y data from matches and num_matches which are GLOBALS
   if (good_matches >= expected_good_matches) //in case of a solid match when results are as we expected or even better
   renderer.box(canvas);
     //we can stop iterations here completely for performance issues
  }
}

var renderer = { //gets hoisted
  lines: function(canvas, matches, num_matches) {
   render_matches(canvas, matches, num_matches);
  },
    box: function(canvas) {
   render_pattern_shape(canvas);
  }
}

//declarations and calls

var matcher = new ORBMatcher();

var imageA = new Image();
imageA.src = "imageA.jpg";
var imageB = new Image();
imageB.src = "imageB.jpg";

var imageDataA = matcher.findPoints(imageA);
var imageDataB = matcher.findPoints(imageB);
//construct correspondences
var matcherData = matcher.findMatchedPoints(X_,Y_);
var canvas = querySelector("#deploy-canvas").getContext("2d");
matcher.projectPointsInto(matcherData,canvas);

My responses for the non-code section in your comment, are as follows.

(...)in the Leaflet environment, we'd use Leaflet polylines, most likely. And in Leaflet, we need to think in display coordinate space, not canvas coordinate space, so the question of what our "native" coordinate space is is really a question for the renderer module, not the underlying matcher library.

Definitely! I am familiar with Leaflet's polylines and think that replacing JSFeat's inbuilt methods with Leaflet's shouldn't be a problem at all, since we already have the indices of all the good matches, which brings me to another very important aspect of this library, the {x,y} image space conversion to LatLng objects. This can be achieved by a Scale function, that takes in the {x,y} coordinates of the matched points, and encapsulates them to points. What I'm proposing here is, the extensive use of the following "scaling" formula, which can be used to convert the image space points to latitudes and longitutes, hence easily being able to pinpoint the desired location on the image using the Leaflet library, and thus being able to use those coordinates in a ton of different Leaflet APIs.

Please do let me know if you feel this formula requires any further clarification or explaination.

WhatsApp Image 2019-04-09 at 05 43 03

WhatsApp Image 2019-04-09 at 05 42 43

I also believe that we should implement the AS module until after we have successfully abstracted away our polylines one, then using that build the AS on top of it.

(...) that you could pass a whole Leaflet.DistortableImageOverlay object to, and exposing methods like LeafletImageMatcher.matchAndAlignImageToImage(distortableImg1, distortableImg1) (a bit wordy, but see how this would be a very Leaflet-oriented library, as opposed to the underlying Matcher library which would be usable for things apart from Leaflet).

Absolutely! I think we can actually encapsulate all the "matching functions" in a different module, while constructing the UI (AutoStitcher+Lines+Boxes) using a different one. This would also make the testing part more convenient, and improve the resuseability of the components as well! We can definitely brainstorm this approach in a new issue!

what about the "confidence" of ORB match? does that mean we need the ORB match to be an object with properties, like matches = {confidence: 56, pairs: [ {x1: 252, y1: 61, x2: 116, y2: 6} ]}, for example? (or would confidence be a property of each pair?) That might mean we could return this from matcher.getMatches(), you know?

The "confidence" for each point is currently being calculated in the match_pattern() util. fn. and can be returned to the getMatches fn. created above. We might want to note that the "confidence" we are talking about here corresponds to the standards set by the match_threshold and ranges between 0-127. The measure of a point being above or below the match_threshold is currently set a property of every single object, so if two points were, say to be in a pair, both of their best_dist (confidence factors) would be the same.

Similarly, we may want to eventually make it possible to downscale the images as an initial step where we're faced with extremely large input images.

pyrdown

JSFeat implements the Gaussian Pyramid method of downsampling a large image to a smaller one in it's pyrdown API, i.e., jsfeat.imgproc.pyrdown(source:matrix_t, dest:matrix_t);. Here we can set the destination matrix to as few rows and columns which will allow us to process things faster while not sacrificing the important inliers in the image, as is evident from the final (smallest) image in the image pyramid below.

mnpyr

Is there any way to randomize the order it searches, and to stop searching once it's found a certain number of high-quality matches?

Absolutely! Instead of starting from the first row and column, I believe a better approach would be starting from the means, i.e, if the image is of the resoulution 640x480, we can start our search from the imgData's pixel info at row,col of [320][240] and moving out in a circular (vignette) fashion. This approach is based on the fact that majorly the images have meaningful data, or inlier clusters count maximum at the center, rather that at the borders. To stop the execution, refer to the projectPointsInto fn. above which we can do in case good_matches >= expected_good_matches, where expected_good_matches is a custom threshold, and can be adjusted to reduce performance issues as well!

(...)there's an 'undo' function generated which can be bound to an Undo UI button? This could be displayed in a notice above, perhaps, where it says "Keep this stitch?" or something.

This can be easily implemented using the recent addTool API of the Leafet.DistortableImage which triggers and adds an undo button to the menu bar every time the user stitches two images, and is removed upon undoing the stitch action. We can display a soft popup here to notify the user of the stitch since alert would be a bit "obstructing".

@warren, no rush, and I do understand that you're really busy these days but can you please review this at the earliest so I can update my proposal accordingly and submit my final draft over to the GSoC website?

Thank you so much for reviewing this!!

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Hi @rexagod - this looks really good and I'd love to see it incorporated. Thanks! But be aware, i don't think Google reads the proposals, and we treat the version here as the official version, so don't sweat copying into the PDF too much! This is just good detail to have as we evaluate all the proposals. Thanks a TON for your response!

Reply to this comment...


Thank you for the quick response, @warren! I just noticed that I somehow left "What I'm proposing here is, __ ." as it is, i.e., blank in my comment above (in the first question after the code section), but I've fixed that now, and wanted to notify you about the same.

Thanks again!! πŸ˜ƒ

Edit: Also added an image at the beginning for better visualisation of the image space. Do let me know if it's unclear or not detailed enough in any way!

Reply to this comment...


Hi, please upload your proposal at the Google Summer of Code website at the earliest. Please ignore this comment if already done.

Reply to this comment...


Thanks for the reminder, @bansal_sidharth2996! I'll upload the proposal ASAP!

Reply to this comment...


Just an update, I've incorporated the changes suggested by @warren above into the proposal. Thanks!

Reply to this comment...


@rexagod this is an amazing proposal. great job!

+1 to @warren's inclination towards modularity. You're on the right track, especially with the code samples in your recent comments. I would encourage you to go even further in this direction. A useful exercise might be to write the code for extracting features, matching points, and computing the homography without any reference to browser APIs or data structures. As far as I can tell, the computer-vision code should be environment-agnostic. You might think about writing it outside of the browser with a nodejs harness which feeds in image arrays and checks that the values are as expected. Ideally, your code shouldn't need any special rendering code at all: if it outputs data in the format that Leaflet.DistortableImage can understand (e.g. image + the coordinates of the warped corners, in a common coordinate space), then you may be able to use Leaflet.DistortableImage to handle the rendering altogether.

One thing I don't see covered in this proposal is how the uploaded images will be aligned to the base map once they been automatically stitched together. Is this within the scope of your proposal? Or will users manually align the stitched image conglomerate against the base map?

Separately, I wonder about the performance of these computer vision operations in the browser. Do you have a sense of how computing keypoints / matches / homography perform in the browser? These operations are all extremely compute-intensive (especially on the GPU), and I have found that JS libraries for numerical computation are not always as robust or performant as their server-side counterparts (e.g. OpenCV). Have we considered the tradeoffs of doing these computations in the browser vs. on the server?

great job again -- you've covered the problem so thoroughly, from feature detection to matching (using RANSAC!) to computing the homography. I'm excited to see this project develop.

Is this a question? Click here to post it to the Questions page.

Reply to this comment...


Hey, @justinmanley! Thank you so much for looking at my proposal, and while I did have a glance at your comment a bit earlier, I was actually quite a bit occupied in a ton of college activities, and even though I could've written back to you just then, I really wanted to take by time to reply on this brilliant and insightful feedback of yours.

First off, I'd like to ensure the fact that (agreeing with @warren) any functionality that we believe could be better implemented in a modularized fashion, will definitely get extracted out into it's own module, for eg., ORB stats will reside in one module, while ORB UI renderer that works on those stats in the other, etc., with our main aim of improving the testing functionality, an easier bisection to locate bugs, and obviously, enhanced reuse-ability in the future as well.

Ideally, your code shouldn't need any special rendering code at all: if it outputs data in the format that Leaflet.DistortableImage can understand (e.g. image + the coordinates of the warped corners, in a common coordinate space), then you may be able to use Leaflet.DistortableImage to handle the rendering altogether.

Exactly! I suppose the rendering part, as suggested by @warren, could be completely done using the Leaflet APIs themselves, while ORB.js, or AS.js, would resort to a headless approach and only pass the useful statistical data into the UI space where I can utilize that data to perform general manipulations, as per the user's requirements.

One thing I don't see covered in this proposal is how the uploaded images will be aligned to the base map once they been automatically stitched together.

That's probably due to the fact that I was initially unclear of the plans regarding the EXIF functionality, but now since it's really close to getting merged, I believe that whenever AS will rubber-sheet and stitch to different images, with the assumption that those two images actually form a bigger overlay of the same area (thus close EXIF coordinates), aligning the larger image according to anyone of those two (based on their EXIF metadata) would automatically align to the other one too!

Separately, I wonder about the performance of these computer vision operations in the browser. Do you have a sense of how computing keypoints / matches / homography perform in the browser? These operations are all extremely compute-intensive (especially on the GPU), and I have found that JS libraries for numerical computation are not always as robust or performant as their server-side counterparts (e.g. OpenCV). Have we considered the tradeoffs of doing these computations in the browser vs. on the server?

Okay so I profiled the ORB implementation in my demo above for ~7 minutes. Below are the results.

GPU Evaluation: The overall site used negligible resources after the first minute. As is visible from the gif below too, the amount of GPU memory needed to process the website could easily be provided by older generation chipsets (~14mb, checkout the top-left corner of the gif below), let alone the newer graphic cards. Below that is the GPU memory usage, that was actually needed for a very short amount of time (~1 sec in the active span of 1 minute, checkout the "summary" tab).

eval

Screenshot (48)

Current Performance Bottlenecks: Most of our performance issues were caused by JS scripting events (Timer fired and Animation Frame Fired to be precise). Some improvements over such scripting practices can be made by using requestAnimationFrame instead of setInterval and setTimeout, which was the only reason that caused the red triangles to appear in my performance evaluation, meaning we should be good to go after these particular methods that implement the same are refactored.

The image below depicts the Timer Fired event (performance critical, notice the red triangle on the corner) that was fired using a setTimeout on orb.js:26 which can be fixed easily by moving the tick and demoapp renders inside a requestAnimationFrame param.

Screenshot (51) Screenshot (52)

The one above was easy, but it was the one below that took the prize (check the number of red triangles). Now I do want to bring it to your notice that as of now ORB does not take up that much resources, due to it's specificity, i.e., unlike OpenCV or any other library, it is (a) tailored for only and only this particular use and (b) will be greatly reduced in terms of events and complexities in the future for an even better performance. Now considering the current stable state of this module, the performance boost we'd be getting by removing all these red flags would be unbelievable, since the scripting processes would be reduced to near-null, which currently make up for the better part of the summary chart, and we just might be able to shift this to the client-side, thus reducing the load on our servers! But this would ultimately imply that we "cut" support for all browsers that do not support the requestAnimationFrame method, which shouldn't cause any problems or reduction in the user count, whatsoever.

Screenshot (49) Screenshot (53)

I'd like to hear your thoughts and insight on any/all of this, and again, thank you so much for the kind words and energy, really appreciate it!

Is this a question? Click here to post it to the Questions page.

Referencing this here as well: https://github.com/publiclab/Leaflet.DistortableImage/issues/110#issuecomment-488421626


aligning the larger image according to anyone of those two (based on their EXIF metadata) would automatically align to the other one too

Cool! It sounds like you've given this some thought. I suspect that this question will still be worth investigating, since EXIF data may be imprecise and/or noisy (e.g. I assume that the location in the EXIF data for images taken with a cellphone will only be as precise as the phone's GPS, which often is accurate within 5-10m). Using the EXIF data from all of the stitched images together may help improve overall accuracy. There's a lot that can be done here -- lots of ways to combine EXIF + image data across images + layers to improve overall accuracy. It's a good thing to be thinking about now, but I think you're correct to defer further investigation until later + look for a solution which is "good enough" (rather than perfect).

Thanks very much for the fabulous performance analysis. You resolved almost all of my questions about client-side image-stitching performance! I have only a few remaining questions, all related: (1) what is the range of image sizes typically uploaded to MapKnitter? (2) what does in-browser performance look like for images at the 50th and 90th size percentiles? If performance is good at the 50th percentile, but poor at the 90th percentile, it might be worth adding a warning to the UI when users try to auto-stitch large images. Image size is relevant because I would expect that stitching performance might scale with O(mn2) or worse for m x n images. No need to do a detailed analysis now, but it would be useful to understand how image size affects performance in order to understand how different MapKnitter users will experience this feature.

Thanks again for the great performance investigation. Your analysis is thorough, and the improvements you suggest are relevant and actionable! Awesome work :)

Is this a question? Click here to post it to the Questions page.


Reply to this comment...


Login to comment.

Public Lab is open for anyone and will always be free. By signing up you'll join a diverse group of community researchers and tap into a lot of grassroots expertise.

Sign up