Thursday, January 4, 2018

Finding likes between two arrays

In a recent interview I was asked to write a function to sole the following problem:

You have a SQL query and two arrays, find the like elements between the two arrays.

(I am assuming that a few of you have your own opinions on how to solve this, and that is excellent - please enlighten me)

The first thought that popped into my head was that this was not a coding interview, the second was that i really needed a laptop to work through this, and the third was - doesn't PowerShell have a cmdlet that will do this for me?

Back at home, I thought I would investigate the PowerShell angle.  Since I had access to a laptop.
I am assuming a small dataset

First, lets build a couple arrays to validate with:

$array1 = "Elmer", "hunter", "Bugs", "rabbit", "Tweety", "bird", "Sylvester", "feline"
$array2 = "Eddie", "investigator", "Roger", "rabbit", "Jessica", "rabbit", "Judge", "doom", "tweety", "bird"

Now, I did add some variation for fun.  Because why might this be an important skill?
It might be important because I might need to mine a bunch of data or logs to investigate a pattern.

At this point I have arbitrarily added an additional requirement on myself; I want to see all of the matched data items, not just the matched values.  Equivalents vs. equals if you will.
Why? Again, in an investigation I will probably want to use a fuzzy match instead of a literal match.  And then I will want to work through the resulting set again.  (most likely both with eyes and with code).

In regards to PowerShell doing this for me, I discovered this:

$compared = Compare-Object -ReferenceObject $array1 -DifferenceObject $array2 -IncludeEqual

But when looking at the output, working with this is not very intuitive and it hides my fuzzy match and detail output.

PS C:\Users\Brian> $compared

InputObject  SideIndicator
-----------  -------------
rabbit       ==
Tweety       ==
bird         ==
Eddie        =>
investigator =>
Roger        =>
Jessica      =>
rabbit       =>
Judge        =>
doom         =>
Elmer        <=
hunter       <=
Bugs         <=
Sylvester    <=
feline       <=

I actually ended up going back to my original idea; 

$likes = @()

foreach ($element1 in $array1){
    foreach ($element2 in $array2){
        if ($element1 -like $element2){
            $likes += $element1;
            $likes += $element2;
        }
    }
}

while not highly efficient and I am sure not wonderful with large data sets, it does give me the results that I wanted for further analysis:

PS C:\Users\Brian> $likes
rabbit
rabbit
rabbit
rabbit
Tweety
tweety
bird
bird

It is easy to see from this output where I might want to go next; counts, deeper analysis, trends, etc.  It all depends on the details in the records used.

If you have different ideas that meet my requirements, please share.


4 comments:

Guenther Schmitz said...

my approach would be:
foreach($elem in $array2) { if($array1 -contains $elem) { echo $elem } }

BrianEh said...

Thanks Guenther;
However, that gives me a different result:

PS C:\Users\Brian> foreach($elem in $array2) { if($array1 -contains $elem) { echo $elem } }
rabbit
rabbit
tweety
bird

It does provide matches, but does it provide all matches and will it consider 'Tweety' as a match to 'tweety'

(I am asking here, not telling)

Guenther Schmitz said...

that's true, it only prints out the elements of $array2, not all matches.

it does match "Tweety" to "tweety" and prints out "tweety" as it is in $array2 (which is the $elem variable I check with $array1).

Guenther Schmitz said...

I think i did not get the question right. my approach only prints out elements of one of the arrays.