What we’re after
jQuery’s
extend function is really useful
and if you’ve ever written a plug-in for the library chances are you’ve
made use of it. I’ve run across a use for this functionality in python
and it also makes an interesting interview question (regardless of
language.)
It’s easy to put what it does in to a brief sentence, “it recursively
merges objects.” The details of its behavior when the deep option is
used are a bit more subtle, and powerful. The best route to an
explanation is through an example.
a = {'a_only': 42, 'a_and_b': 43,
'a_only_dict': {'a': 44}, 'a_and_b_dict': {'a_o': 45, 'a_a_b': 46}}
b = {'b_only': 45, 'a_and_b': 46,
'b_only_dict': {'a': 47}, 'a_and_b_dict': {'b_o': 48, 'a_a_b': 49}}
c = dict_merge(a, b)
# c is now
c = {'a_and_b': 46,
'a_and_b_dict': {'a_a_b': 49, 'a_o': 45, 'b_o': 48},
'a_only': 42,
'a_only_dict': {'a': 44},
'b_only': 45,
'b_only_dict': {'a': 47}}
A solution
If you take a look at jQuery’s
implementation (search for
“jQuery.extend =”) there’s quite a bit to it, but most of it turns out
to be argument handling. For comparison here’s an implementation in
python.
def dict_merge(a, b):
'''recursively merges dict's. not just simple a['key'] = b['key'], if
both a and bhave a key who's value is a dict then dict_merge is called
on both values and the result stored in the returned dictionary.'''
if not isinstance(b, dict):
return b
result = deepcopy(a)
for k, v in b.iteritems():
if k in result and isinstance(result[k], dict):
result[k] = dict_merge(result[k], v)
else:
result[k] = deepcopy(v)
return result
Breaking it down line-by-line, the initial if handles the case where
where b is not a dictionary, meaning that it wins out. This normally
happens as an end-case, but here it additionally allows our function to
work when it’s not passed dictionaries.
The next line makes a deep copy of a. While not strictly necessary it
avoid modifying a and protects the returned result from future
modifications to any part of a. The downside to this is that it makes
our function more expensive and we may even throw away many of the
things copied in the next step.
Then we come to the for each over the items of b. If the key, k, is
also in a then we’re in the recursive case where we assign the results
of a call to ourself in to the output dictionary under k. If a
doesn’t have a value for k then we’ll just deepcopy b’s value in to
result.
As an Interview Question
I’ve been asked variations of this question a couple times and I’ve used
it quite a bit when interviewing candidates. It’s too complex to explain
for phone/Skype screens so don’t even try. It works OK for on-site
loops, but you have to have a really solid example and setup or else it
won’t be clear to the interviewee exactly what problem needs to be
solved. You should spend a decent amount of time coming up with a
succinct set of a’s and b’s that get across the set of behaviors you
expect them to handle and practice setting up the question on coworkers
several times first to get it right.
This question really comes in to it’s own when asked as part of a
“homework” coding exercise. In that situation it’s best to set up a
framework to code in that preferably includes unit tests that their code
is expected to pass. They shouldn’t include all of the edge cases, but
enough of them so that the code you get back has a decent chance of
being correct. You can always run it through more vigorous tests once
submitted.
The great part about this question as a coding exercise is that you can
spend a bit more time setting it up and you have lots of examples about
what’s expected built in to the unit tests. The problem itself isn’t too
difficult, but it will require a bit of mental effort to solve it and
there’s enough meat to it that people who just hack at stuff might solve
it, but will quite likely turn in some god-awfully complicated solution
that they won’t even be able to explain.
I’ll leave the details of how I go about interviewing people for another
time. I plan to write a lot more about it in the coming weeks so check
back if that interests you.
A Real World Use
So beyond interviewing people where is this functionality actually
useful? Configuration seems to be the most common use case.
jQuery makes extensive use of it for
argument/options handling. I originally coded up the python version for
dealing with application configuration across multiple lines. In that
case the deepcopy was omitted in favor of simple copies/assignments, but
otherwise the code is the same. A contrived example this sort of
configuration might look something like the following.
gbl = {'company': 'xomedia', 'phones': ['123-123-1234'],
'db': {'username': 'xo', 'name': 'xo'},
'cache': {'expires': 300}, 'logging': {'level': 'info'}}
dev = {'phones': ['555-555-5555', '333-333-3333'],
'db': {'host': 'db.xormedia.dev', 'password': 'well-known'},
'cache': {'hosts': ['c1.xormedia.dev', 'c2.xormedia.dev']},
'logging': {'level': 'debug', 'to': 'stdout'},
'mock-credit-cards': True}
stage = {'db': {'host': 'db.xormedia.stage', 'password': 'a-secret'},
'logging': {'to': 'a-file.log'}, 'timing': True,
'mock-credit-cards': True}
prod = {'db': {'host': 'db.xormedia.com', 'password': 'still-secret'},
'logging': {'to': 'a-file.log'}}
When it comes time to bring up an app you need to start with the base,
glb, and then add the appropriate environment to it, intelligently.
# config = glb + dev
config = {'cache': {'expires': 300, 'hosts': ['c1.xormedia.dev', 'c2.xormedia.dev']},
'company': 'xomedia',
'db': {'host': 'db.xormedia.dev',
'name': 'xo',
'password': 'well-known',
'username': 'xo'},
'logging': {'level': 'debug', 'to': 'stdout'},
'mock-credit-cards': True,
'phones': ['555-555-5555', '333-333-3333']}
So that’s that. Think you can improve upon the dict_merge
implementation above or do you have an interesting alternate approach.
If so please share.