Local Feature Matching using Harris Corners and SIFT

Class	Instructor	Date	Language	Ta'ed	Code
CS 6476 Computer Vision	James Hays	Fall 2015	MATLAB	No	Code N/A

That's a lot of points - 1000 initial interest points on Mount Rushmore.

This project was concerned with implementing an algorithm that could recognize similarities between two images by finding interesting features in each and matching these features accurately. Through this mechanism a sufficiently high accuracy level of the matchings would imply that the images were of the same thing. The algorithm driving this process had 3 parts :

Find Interesting Points in Each Image
Devise a Descriptor for These Points
Match Descriptors of Points in Both Images.

Part 1 : Finding Interest(ing) Points - Harris Corner Detector

What is an interesting point? For our purposes, we define "interesting" as the state of being recognizable and somewhat unique, at least locally. In other words, the point we find interesting will be easily distinguishable from its immediate surroundings - if we were to obvserve the image with a jeweler's glass, we want to find an area we can easily recognize. Computationally, we achieve this by finding areas of multi-directional gradients in intensity around a single pixel in the image, characteristics most obviously seen in corners.

A word about gradients. In an image, gradients are used to describe a change in intensity or color. A straight edge, like a wall, will show a significant gradient in a single direction, normal to the edge, denoting the change in intensity between the wall and its background in the image. Corners will exhibit gradients that radiate outward, starting normal to one of the constituent edges of the corner and sweeping around to the other.

For my implementation of the corner detection, I followed a multi-step recipe.

Filter Images with Derivative of Gaussian in X and Y, revealing edges in each direction. I implemented both the Harris computational stencil and the Derivative of Gaussian method for building the filter kernel, and ended up using the Derivative of Gaussian because the results were stronger. I also used a variety of methods to arrange my image as I convolved it with the filter - keeping colors independent, averaging the three channels(Intensity in the HSI model) or finding the mean of the extremal values in each channel (Lightness in the HSL model), and ultimately chose the Lightness model, which gave an improvement of 2-3% in match accuracy, all other elements of the algorithm being equal. I used my implementation of MyImFilter and specifically FFT, from the first project for the convolutions, since I understood it and already had some built in utilities to manage different channels.

Notre Dame Image 1

		%Building Edge Images
		    sig = 1;
		    sz = (sig*4)+1;
		    dgfilter=fspecial('gauss',[sz, sz], sig);
		    [dgx,dgy] = gradient(dgfilter);   
		    %gradient of image in x dir
		    Ix = myImFilter(image, dgx,filterMethod);
		    %gradient of image in y dir
		    Iy = myImFilter(image, dgy,filterMethod);

1) Derivative of Gaussian in X Convolution	Derivative of Gaussian in Y Convolution

I built the components for the Cornerness Function, the three component matrices of the outer product of the two resultant edge image matrices convolved with a weighting function (another, larger Gaussian). I then used these components to derive the Cornerness function, using an alphaR value of .06 as suggested in the text.

%Building Cornerness Image
		    sig = 2;
		    %gaussian local weight - resize for convolution
		    w = fspecial('gauss', feature_width+1, sig);
		    Ix2 = conv2(Ix(:,:,1).^2, w, 'same'); 
		    Iy2 = conv2(Iy(:,:,1).^2, w, 'same'); 
		    Ixy = conv2(Ix(:,:,1).*Iy(:,:,1), w, 'same');    
		    Hs = (Ix2.*Iy2 - Ixy.^2) - alphaR*(Ix2 + Iy2).^2;

2)Gaussian Convolved X*X edges	Gaussian Convolved Y*Y edges

2)Gaussian Convolved X*Y images	Solution to the "Cornerness" function

I then thresholded the images, by first building a mosaic-like image representing a uniform intensity in a small neighborhood around a particularly bright spot in the Cornerness image, and then iteratively deriving an appropriate threshold on these patches (using adaptive time stepping/threshold modification) to find the number of points I wanted from each image. I had to take care to not find points too close to the edge of the image, since this would compromise the descriptor building in Part 2, which required a block of, in my case, 16x16 pixels to comprise the neighborhood around each point.

%Thresholding with Adaptive Timestep
		%build boundary of no interest points around edges, for feature assignment
		tmpHs = zeros(size(Hs));
		tmpHs(bounds:(picH-bounds),bounds:(picW-bounds),:) = Hs(bounds:(picH-bounds),bounds:(picW-bounds),:);
		Hs_prime = ordfilt2(tmpHs, orderth, ones(fwp1));       %2X as fast as colfilt on this data
		lNumPts = .999 * numPtsDesired;
		hNumPts = 1.001 *numPtsDesired;
		maxIters = numPtsDesired * 10;
		threshDir = 0;
		lRate = .01;
		while(((numPts < lNumPts)||(numPts > hNumPts)) && (iters < maxIters))
		    %i say, adaptive time step, my good man!
		    if(numPts < lNumPts)
		        thresh = thresh - lRate;
		        if(threshDir == 1)  %if last time it went up, then it's bouncing, decrease lRate
		            lRate = lRate * .51;
		        elseif (threshDir == -1)    %last time went down too, increase lRate
		            lRate = lRate * 1.9;
		        end
		        threshDir = -1;
		    elseif (numPts > hNumPts)
		        %thresh = thresh * 1.001;
		        thresh = thresh + lRate;
		        if(threshDir == -1)  %if last time it went down, then it's bouncing, decrease lRate
		            lRate = lRate * .51;
		        elseif (threshDir == 1)    %last time went up too, increase lRate
		            lRate = lRate * 1.9;
		        end
		        threshDir = 1;
		    end
		    %finds actual corner point in patch that exceeds threshold
		    HsNew = (Hs_prime > thresh ) & (Hs_prime == tmpHs);			
		    numPts = sum(sum(HsNew));
		    iters = iters+1;
		end

3)Neighborhoods used to Threshold Points	Interest Points -> Max Points in the Neighborhoods

Notre Dame Image 2 results can be seen below.

Part 2 : Describing these points - Building a SIFT-inspired Descriptor

How can we clearly describe these interest points that we have accumulated in each image? Since our interesting points are described as being locally distinguished via their gradients, we use the orientation of these gradients to help describe these points. We also look at the orientation of the gradients in a neighborhood around the points, extending our definition of "interesting" from part 1 (where we say a point is interesting if we can distinguish it within a small neighborhood) to include the neighborhood itself (i.e. using gradients to distinguish characteristics within small neighborhoods from other similar sized neighborhoods within the image).

The algorithm to accomplish this consisted of convolving our image with directional filters to derive the gradients in each of 8 directions, then examining a patch around each interest point, weighted with a Gaussian centered on the interest point, to accumulate a histogram of gradients in each direction, in each of 16 bins around the interest point, for a grand total of 128 values in the descriptor for each interest point. In more specific detail :

Find the Gradients of the image in each of 8 directions (+\- 45, 90 and 135 degrees, along with 0 and 180). I again used the Derivative of Gaussians for this filter, finding the derivative in the X and Y directions and then building the rotated filter through a linear combination of these filters weighted by sines and cosines. Again I used my implementation of myImFilter and FFT from the first project, since in some ways it was more resilient/easy to use than conv2, although I also made use of conv2 in certain circumstances.

%Filtering Image in each of 8 directions - using a small 5x5 stencil
		sig = 1;
		sz = (sig*4)+1;
		dgfilter=fspecial('gauss',[sz, sz], sig);
		[dgx,dgy] = gradient(dgfilter); 
		step = pi/4.0;
		stop = 2*pi;
		idx = 1;
		for theta = 0 : step : stop
		       %to find derivative along any orientation theta, given angle theta,
		       dgTheta = cos(theta)*dgx + sin(theta)*dgy;
		       imgGrad(idx) = myImFilter(image, dgTheta,2);
		end;

Orientation Gradients of Notre Dame Picture 1 for Descriptor Building

0 degrees	45 degrees	90 degrees	135 degrees

180 degrees	225 degrees	270 degrees	315 degrees

Build orientation histograms for each direction. For each of the 8 directions, for every interest point, I weight the neighborhood pixels of the interest point via a gaussian that extends beyond the bounds of the neighborhood, and then we partition the neighborhood into a 4x4 grid, representing bins, with each containing, in my case, 16 pixels. I then build a histogram of the weighted gradients and assign them to their respective bins. I actually interpolated around each point, starting with a neighborhood that was 1 pixel larger, although this probably had very little impact on the final result. Since I iterate through the point neighborhoods with a particular direction gradient, I have to build the feature vector in slices.

%Filtering Image in each of 8 directions - using a small 5x5 stencil
		fwHf = feature_width * .5;    
		%weighting window
		w = fspecial('gauss', feature_width+1, fwHf);    
		%for interpolating gradients from odd-sized region around interest point to even sized region 
		movAvgBlk = ones(2,2) *.25;  
		blkSumFunc = @(block_struct) sum(sum(block_struct.data)) * ones(size(block_struct.data)*.25);
		fIdxVec = (1:8:128);     	%idx's of first orientation values
		for pIdx = 1:numPts
		    %interpolates a matrix, by finding midway values between each cell (ctr of 4 cells)
		    %conv2 here is building a moving average (interpolation) of each 2x2 block in the gradient 
		    %block around each point, which is 17x17, so that result is 16x16
		    %full gives res 1 pxl border, since filter is movAvgBlk 2x2
		    imTmp = conv2(imgGrad(rowSt(pIdx):rowEnd(pIdx),colSt(pIdx):colEnd(pIdx)).*w, movAvgBlk, 'full');   
		    imgGradBlock = imTmp(2:(end-1),2:(end-1));
		    %this gives the sum of all the imgGradBlock values in each of the 4x4 sub-windows 
		    %of the imgGradBlock - blkSz is 1/4 feature width
		    blkSum = blockproc(imgGradBlock,[blkSz blkSz], blkSumFunc);
		    features(pIdx,fIdxVec) = blkSum(:);            
		end

Normalize the results. I normalize every feature vector, threshold the max normalized value of any dimension to be .2, raise this capped vector to the .23 power (which increases and sorta "smears" out very small values, exagerating small variances which I found helped in differentiation). These

%Normalize each complete vector
		fCapVal = .2;
		for pIdx = 1:numPts
		    blkSum = features(pIdx,:);
		    blkSum = blkSum/norm(blkSum);       	%normalized
		    blkSum = min(fCapVal,blkSum);         	%cap values at .2
		    blkSum = blkSum.^.23;       			%raise to root power - spreads out values < .4
		    blkSum = blkSum/norm(blkSum);       	%normalized again
		    features(pIdx,:) = blkSum;        
		end

Part 3 : Matchmaking - Matching Points by Minimizing Feature Distance

To draw correspondences between the two images, we need to match the interesting points by finding those in each image whose descriptors were most alike. In algorithmic terms, this problem is a variant of the Knapsack problem, where multiple constraints are governing the optimization process. We wish to find pairs of features that are the most similar (minimizing their differences) while maximizing the ratio of differences between the best match and next best match for each of them. This implies that sometimes the nearest feature is not necessarily the best feature to match to.

This algorithm was also involved :

Build a Table of Sorted Feature Distances. I iterated through each pair of points, found the L2 norm (well, specifically the L2.1 norm - believe it or not, it made a difference, a few percent in accuracy on both Notre Dame and Rushmore), and recorded these distances, sorted. I also recorded the ratio of each distance to next nearest neighbor, for every pair of points and their next neighbors, distance-wise.

%Build a table that holds all points sorted by distance to all neighbors.
		for idx1 = 1:num_features1
		    tmpDists = zeros(num_features2,1);
		    idx2Vec = zeros(num_features2,1);
		    tmpIdx = 1;
		    for idx2 = 1:num_features2
		        tmpDists(tmpIdx,1) = norm(features1(idx1,:) - features2(idx2,:),normType);  %L2 norm for dist         
		        idx2Vec(tmpIdx,1) = idx2;
		        tmpIdx = tmpIdx+1;
		    end
		    %sorted by increasing distance
		    [tmpSortedDists,sorted_idxs] = sort(tmpDists,'ascend');
		    sortedIdx2Vec = idx2Vec(sorted_idxs,:);        
		    ratIdx = [1:(num_features2-1)];                 %distance to closest idx vec
		    ratIdx1 = ratIdx+1;                             %distance to nextclosest idx vec 
		    tmpDistRats = tmpSortedDists(ratIdx)./tmpSortedDists(ratIdx1);       
		    ratioDists = abs(tmpDistRats - optDistRatVal);  %minimize the difference of this ratio and .45
		
		    ratioDists(num_features2) = 999;
		    
		    distEnd = (distValsIdx+num_features2-1);
		    idx1Col = repmat(idx1,num_features2,1);
		    %structure holding distance value-sorted vals
		    distVals(distValsIdx:distEnd, colIdx) = idx1Col;
		    distVals(distValsIdx:distEnd,(colIdx+1)) = sortedIdx2Vec;
		    distVals(distValsIdx:distEnd,(colIdx+2)) = tmpSortedDists;
		    distVals(distValsIdx:distEnd,(colIdx+3)) = ratioDists;
		
		    colIdx = colIdx + numDistCols;
		end    %for each idx1

Knapsack! (sorta). Next, I decide upon feature pairs. This is a dynamic programming problem with a few tricky characteristics. For one thing, since I drove my distance-minimization loop primarily with the points from Image 1, the points from Image 2 can be mapped (at this stage) to multiple points in Image 1. I have to make sure that I don't multi-assign points from Image 2, and I also want to make sure that I don't have Image 2 mapped to a less than ideal candidate just because I derived the distances from the point of view of the Image 1 points. In other words, I have to make sure that every point is mapped to its nearest neighbor, not just those from Image 1. To accomplish this, I tabulate the idxs of the points in Image 1 that match each particular Image 2, along with their distance, NNDR, and the match index of any existing match. This way, if I get a collision, where an Image 2 point comes up as the closest point to two different points in Image 1, I can pick the best match for it, and remap the orphaned Image 1 point to its next best match, if it is available. This was a bit tricky to debug.

%Put in idx of match with particular idx2 if matched, and dist, and ratio, and matchIdx of previous match
		matchedIDX2 = zeros(num_features2,4);  
		for idx1Loop = 1:num_features1
		    idx1 = idx1Loop;
		    oldMatchIdx = matchIdx;
		    compIdx = 1;                                        %computing row for this idx
		    found = 0;
		    while ((compIdx <= maxRows) && (found == 0))
		        distValsIdx = ((idx1-1) * 4) + 1;               %col idx in distVals table for idx1
		        idx2 = distVals(compIdx,distValsIdx+1);         %current idx2 being compared
		        dist12 = distVals(compIdx,distValsIdx+2);       %distance between 1 and 2
		        distRat12 = distVals(compIdx,distValsIdx+3);    %dist ratio between 1 and 2 and 1 and next 2   
		        confVal = (1 - 2*distRat12);
		        if(dist12 < optDistVal)                 		%if dist ratio of closest match is less than ratio threshold
		            if((matchedIDX2(idx2,1) == 0))% && (1 - 2*distRat12) > minConf)          %idx2 not been matched yet
		                matchedIDX2(idx2,1) = idx1;
		                matchedIDX2(idx2,2) = dist12;
		                matchedIDX2(idx2,3) = confVal;  
		                matchedIDX2(idx2,4) = matchIdx;
		                found = 1;
		                matches(matchIdx,1) = idx1;             %img1 idx
		                matches(matchIdx,2) = idx2;     		%img2 idx
		                matchDists(matchIdx,1) = dist12;  		%dist
		                confidences(matchIdx) = confVal;
		            else                                		%has been matched already - keep match with smallest distance
		                if(matchedIDX2(idx2,2) > dist12)    	%this match is closer than old match, find new values for old match
		                    matchedIDX2(idx2,1) = idx1;
		                    matchedIDX2(idx2,2) = dist12;
		                    matchedIDX2(idx2,3) = confVal;  
		                    matchedIDX2(idx2,4) = matchIdx;
		                    found = 1;
		                    matches(matchIdx,1) = idx1;         %img1 idx
		                    matches(matchIdx,2) = idx2;     	%img2 idx
		                    matchDists(matchIdx,1) = dist12;  	%dist
		                    confidences(matchIdx) = confVal;                              
		                    compIdx = compIdx + 1;           	%look at next row in table for old idx1 this replaced
		                    idx1 = matchedIDX2(idx2,1);
		                    oldMatchIdx = matchIdx;
		                    matchIdx = matchedIDX2(idx2,4);
		                else                                	%this match is further than old match, find new values for this match
		                    compIdx = compIdx + 1;          	%look at next row in table
		                end
		            end%if idx2 hasn't been matched yet
		        else
		            compIdx = maxRows+1;
		        end %if dist < distThresh            
		    end %while not found or compIdx < 10 ( compIdx is row idx in distVals)
		    matchIdx = oldMatchIdx;
		    if found == 1
		        matchIdx = matchIdx + 1; 
		    end       
		end%for idx1

Threshold all matches by how "confident" I am in them. I used a variant of the ratio test as a measure of the confidence of a match, where built a probability of the match being good as (1 - 2 * NNDR). I then discard any matches that have a confidence probability of less than .3

%Sort by confidence and discard the slackers.
		[confidences, ind] = sort(confidences, 'descend');
		matches = matches(ind,:); 
		idxToLose = any(confidences < minConfidence,2);
		confidences(idxToLose,:)=[];
		matches(idxToLose,:)=[];

Results

Notre Dame 102 correct matches and 10 misses. Not too shabby.

I got pretty good results with both Notre Dame and Mount Rushmore. When originally finding 1000 interest points, the Notre Dame image pair resulted in 102 correct matches and 10 misses, while the Rushmore resulted in 88 correct matches and 14 misses, for 13.73% error rate. Initially I got none correct on the Gaudi image, but when I made sure that both images were of similar sizes, I did better, although the correspondence evaluator still didn't accept any of my matches. However, when I rescaled the image algorithmically to find the interest points and build the feature vectors, the evaluator did recognize the results I got, as can be seen below.

Rushmore 88 correct matches and 14 misses. Not as good as Notre Dame, but still not bad.

Gaudi is ok when resized (algorithmically, so both images are fairly close in size), although the evaluator couldn't recognize it.

Other Images of My Results

Part 1 - Interest Points on Notre Dame Picture 2

1)Derivative of Gaussian in X Convolution	Derivative of Gaussian in Y Convolution

2)Gaussian Convolved X*X edges	Gaussian Convolved Y*Y edges

2)Gaussian Convolved X*Y images	Solution to the "Cornerness" function

3)Neighborhoods used to Threshold Points	Interest Points -> Max Points in the Neighborhoods

Part 2 - Orientation Gradients of Notre Dame Picture 2 for Descriptor Building

0 degrees	45 degrees	90 degrees	135 degrees

180 degrees	225 degrees	270 degrees	315 degrees