Thursday, January 4, 2018

Finding likes between two arrays

In a recent interview I was asked to write a function to sole the following problem:

You have a SQL query and two arrays, find the like elements between the two arrays.

(I am assuming that a few of you have your own opinions on how to solve this, and that is excellent - please enlighten me)

The first thought that popped into my head was that this was not a coding interview, the second was that i really needed a laptop to work through this, and the third was - doesn't PowerShell have a cmdlet that will do this for me?

Back at home, I thought I would investigate the PowerShell angle.  Since I had access to a laptop.
I am assuming a small dataset

First, lets build a couple arrays to validate with:

$array1 = "Elmer", "hunter", "Bugs", "rabbit", "Tweety", "bird", "Sylvester", "feline"
$array2 = "Eddie", "investigator", "Roger", "rabbit", "Jessica", "rabbit", "Judge", "doom", "tweety", "bird"

Now, I did add some variation for fun.  Because why might this be an important skill?
It might be important because I might need to mine a bunch of data or logs to investigate a pattern.

At this point I have arbitrarily added an additional requirement on myself; I want to see all of the matched data items, not just the matched values.  Equivalents vs. equals if you will.
Why? Again, in an investigation I will probably want to use a fuzzy match instead of a literal match.  And then I will want to work through the resulting set again.  (most likely both with eyes and with code).

In regards to PowerShell doing this for me, I discovered this:

$compared = Compare-Object -ReferenceObject $array1 -DifferenceObject $array2 -IncludeEqual

But when looking at the output, working with this is not very intuitive and it hides my fuzzy match and detail output.

PS C:\Users\Brian> $compared

InputObject  SideIndicator
-----------  -------------
rabbit       ==
Tweety       ==
bird         ==
Eddie        =>
investigator =>
Roger        =>
Jessica      =>
rabbit       =>
Judge        =>
doom         =>
Elmer        <=
hunter       <=
Bugs         <=
Sylvester    <=
feline       <=

I actually ended up going back to my original idea; 

$likes = @()

foreach ($element1 in $array1){
    foreach ($element2 in $array2){
        if ($element1 -like $element2){
            $likes += $element1;
            $likes += $element2;

while not highly efficient and I am sure not wonderful with large data sets, it does give me the results that I wanted for further analysis:

PS C:\Users\Brian> $likes

It is easy to see from this output where I might want to go next; counts, deeper analysis, trends, etc.  It all depends on the details in the records used.

If you have different ideas that meet my requirements, please share.


Guenther Schmitz said...

my approach would be:
foreach($elem in $array2) { if($array1 -contains $elem) { echo $elem } }

BrianEh said...

Thanks Guenther;
However, that gives me a different result:

PS C:\Users\Brian> foreach($elem in $array2) { if($array1 -contains $elem) { echo $elem } }

It does provide matches, but does it provide all matches and will it consider 'Tweety' as a match to 'tweety'

(I am asking here, not telling)

Guenther Schmitz said...

that's true, it only prints out the elements of $array2, not all matches.

it does match "Tweety" to "tweety" and prints out "tweety" as it is in $array2 (which is the $elem variable I check with $array1).

Guenther Schmitz said...

I think i did not get the question right. my approach only prints out elements of one of the arrays.