There are times when you need to middle truncate a string. In many cases
it’s for UX/human purposes, though in some situations it’s the best way
to generate unique string for a length-limited field. This is the case I
ran in to recently in trying to automate submission of IAP to both
Google Play and the App Store which require short unique names for each
SKU.
The Setup
Consider the following titles, each of which is 39 characters long.
Midsomer Murders - Series 1 - Episode 1
Midsomer Murders - Series 1 - Episode 2
Midsomer Murders - Series 1 - Episode 3
Midsomer Murders - Series 1 - Episode 4
Midsomer Murders - Series 1 - Episode 5
Midsomer Murders - Series 2 - Episode 1
Midsomer Murders - Series 2 - Episode 2
Midsomer Murders - Series 2 - Episode 3
Midsomer Murders - Series 2 - Episode 4
Midsomer Murders - Series 2 - Episode 5
Midsomer Murders - Series 3 - Episode 1
Midsomer Murders - Series 3 - Episode 2
Midsomer Murders - Series 3 - Episode 3
Midsomer Murders - Series 3 - Episode 4
Midsomer Murders - Series 3 - Episode 5
...
Assume we had to fit these strings in to a field we had no control over
that requires them to be 32 characters or less and unique, or perhaps
we’re displaying them in a UI where there’s not enough room for the full
title. The naive approach would be to truncate them to 32 characters in
length and add ellipsis to make it clear that the title has been
truncated.
Midsomer Murders - Series 1 -...
Midsomer Murders - Series 1 -...
...
That doesn’t work particularly well as it results in duplicates across
each series. Taking a closer look at the format of the titles, which in
this case are consistent, we notice that there’s two points that will
uniquely identify an episode. The series number and the episode number.
So what if we truncate things in the middle rather than the end.
Midsomer Murderi...1 - Episode 1
Midsomer Murders...1 - Episode 2
...
The code to do this is fairly simple and looks something like the
following.
def truncate_middle(s, n):
if len(s) <= n:
# string is already short-enough
return s
# half of the size, minus the 3 .'s
n_2 = int(n) / 2 - 3
# whatever's left
n_1 = n - n_2 - 3
return '{0}...{1}'.format(s[:n_1], s[-n_2:])
This process isn’t perfect though as with a different set of titles it
may truncate out the series number as result in duplicates. For UI
purposes this may be acceptable (a best effort,) but for something that
requires uniqueness won’t quite be enough. In my particular situation
the items have hex UUID’s as unique identifiers so the simplest thing to
do was to append a few characters of it to the end of the title before
truncating. This for all practical purposes insures uniqueness. What
other solutions can you think of?