Categories
algorithm language-agnostic matching sorting

How can I pair socks from a pile efficiently?

4108

Yesterday I was pairing the socks from the clean laundry and figured out the way I was doing it is not very efficient. I was doing a naive search — picking one sock and “iterating” the pile in order to find its pair. This requires iterating over n/2 * n/4 = n2/8 socks on average.

As a computer scientist I was thinking what I could do? Sorting (according to size/color/…) of course came to mind to achieve an O(NlogN) solution.

Hashing or other not-in-place solutions are not an option, because I am not able to duplicate my socks (though it could be nice if I could).

So, the question is basically:

Given a pile of n pairs of socks, containing 2n elements (assume each sock has exactly one matching pair), what is the best way to pair them up efficiently with up to logarithmic extra space? (I believe I can remember that amount of info if needed.)

I will appreciate an answer that addresses the following aspects:

  • A general theoretical solution for a huge number of socks.
  • The actual number of socks is not that large, I don’t believe my spouse and I have more than 30 pairs. (And it is fairly easy to distinguish between my socks and hers; can this be used as well?)
  • Is it equivalent to the element distinctness problem?

42

  • 474

    I use pigeon hole principle to pair exactly one from the laundry pile. I have 3 different colors of socks (Red,Blue and Green) and 2 pairs of each color. I pick up 4 number of socks each time and I always make up a pair and get to work.

    – Srinivas

    Jan 19, 2013 at 15:37

  • 65

    Yet another pigeon hole principle: if you take a subset of n/2 +1 socks, there must be at least one pair in this subset.

    Jan 19, 2013 at 15:57

  • 42

    Great question! You might be interested in my article on a related problem, which is a discussion of the probability of pulling two matched socks out of the pile: blogs.msdn.com/b/ericlippert/archive/2010/03/22/…

    Jan 20, 2013 at 19:08

  • 376

    Why not spawn a child and waitpid so that, as the parent, you’re not even sorting any socks yourself?

    – Mxyk

    Sep 6, 2013 at 16:48

  • 172

    I solved this problem by only owning white knee-high socks. They all match. I could simply grab any two socks at random from the pile and they would match. I further simplify the problem by NOT pairing the socks. I have a sock drawer that I simply throw all my socks into, unpaired. I grab two at random from the drawer every morning. I’ve simplified it down to O(0). Can’t get any simpler than that. 🙂

    – Lee

    Sep 6, 2013 at 20:18

2549

Sorting solutions have been proposed, but sorting is a little too much: We don’t need order; we just need equality groups.

So hashing would be enough (and faster).

  1. For each color of socks, form a pile. Iterate over all socks in your input basket and distribute them onto the color piles.
  2. Iterate over each pile and distribute it by some other metric (e.g. pattern) into the second set of piles
  3. Recursively apply this scheme until you have distributed all socks onto very small piles that you can visually process immediately

This kind of recursive hash partitioning is actually being done by SQL Server when it needs to hash join or hash aggregate over huge data sets. It distributes its build input stream into many partitions which are independent. This scheme scales to arbitrary amounts of data and multiple CPUs linearly.

You don’t need recursive partitioning if you can find a distribution key (hash key) that provides enough buckets that each bucket is small enough to be processed very quickly. Unfortunately, I don’t think socks have such a property.

If each sock had an integer called “PairID” one could easily distribute them into 10 buckets according to PairID % 10 (the last digit).

The best real-world partitioning I can think of is creating a rectangle of piles: one dimension is color, the other is the pattern. Why a rectangle? Because we need O(1) random-access to piles. (A 3D cuboid would also work, but that is not very practical.)


Update:

What about parallelism? Can multiple humans match the socks faster?

  1. The simplest parallelization strategy is to have multiple workers take from the input basket and put the socks onto the piles. This only scales up so much – imagine 100 people fighting over 10 piles. The synchronization costs (manifesting themselves as hand-collisions and human communication) destroy efficiency and speed-up (see the Universal Scalability Law!). Is this prone to deadlocks? No, because each worker only needs to access one pile at a time. With just one “lock” there cannot be a deadlock. Livelocks might be possible depending on how the humans coordinate access to piles. They might just use random backoff like network cards do that on a physical level to determine what card can exclusively access the network wire. If it works for NICs, it should work for humans as well.
  2. It scales nearly indefinitely if each worker has its own set of piles. Workers can then take big chunks of socks from the input basket (very little contention as they are doing it rarely) and they do not need to synchronise when distributing the socks at all (because they have thread-local piles). At the end, all workers need to union their pile-sets. I believe that can be done in O(log (worker count * piles per worker)) if the workers form an aggregation tree.

What about the element distinctness problem? As the article states, the element distinctness problem can be solved in O(N). This is the same for the socks problem (also O(N), if you need only one distribution step (I proposed multiple steps only because humans are bad at calculations – one step is enough if you distribute on md5(color, length, pattern, ...), i.e. a perfect hash of all attributes)).

Clearly, one cannot go faster than O(N), so we have reached the optimal lower bound.

Although the outputs are not exactly the same (in one case, just a boolean. In the other case, the pairs of socks), the asymptotic complexities are the same.

36

  • 78

    This is exactly what I do! I make piles dependent on the style of the opening of the sock (I only have white), that gives me enough “buckets” to quickly match each of those up.

    Jan 20, 2013 at 0:00

  • 35

    I’ve tried this with my socks (I’ve got easily 30+ pairs) and man it is FAST. One problem I’ve found is when I can’t have a good enough hash algorithm (I’ve got lots of white socks without any pattern) so it becomes hard. In that case, what would be the optimal way to do it?

    Jan 20, 2013 at 14:14


  • 62

    @NothingsImpossible that’s how hash collision attacks feel like for a poor web-server! Are the white socks distinguishable by some attribute? There must be something you can distribute them on. Otherwise, you could just form pairs arbitrarily.

    – usr

    Jan 20, 2013 at 14:22

  • 40

    This is a Radix Sort, which I agree is the right answer. @MarkPeters I don’t think you need a lookup table. A single linear pass over the socks can convert the socks to number vectors, making the mapping of “sock segment” to bucket trivial. The socks can be tied to the vectors with string so that you don’t need another linear pass at the end.

    – Pointy

    Jan 20, 2013 at 21:31

  • 53

    A guy I went to college with actually had PairIDs. It was sewn on each pair of socks with thread: 1, 2, 3, 4…

    Jan 24, 2013 at 22:51

615

As the architecture of the human brain is completely different than a modern CPU, this question makes no practical sense.

Humans can win over CPU algorithms using the fact that “finding a matching pair” can be one operation for a set that isn’t too big.

My algorithm:

spread_all_socks_on_flat_surface();
while (socks_left_on_a_surface()) {
     // Thanks to human visual SIMD, this is one, quick operation.
     pair = notice_any_matching_pair();
     remove_socks_pair_from_surface(pair);
}

At least this is what I am using in real life, and I find it very efficient. The downside is it requires a flat surface, but it’s usually abundant.

26

  • 244

    as the number of socks increases, human’s SIMD become no better than a CPU.

    – Lie Ryan

    Jan 20, 2013 at 15:42


  • 29

    The best answer, IMO. While it’s fun and clever (and appropriate for SO) to reduce a day-to-day problem to a computer algorithm, it makes much more sense to use the resolution power of man’s eye/brain for a set as small as ~60 socks.

    Jan 20, 2013 at 18:27

  • 14

    @LieRyan If the socks are uniformily distributed, you will end up noticing a pair in any sufficiently small set of socks due to the birthday paradox (unless you can distinguish colors to arbitrary precision, which I doubt) so the bottleneck here wouldn’t be the human color matching algorithm but the spreading step.

    – Thomas

    Jan 21, 2013 at 13:21


  • 14

    @dpc.ucore.info No, because they have different woven cuff patterns, cuff lengths, overall lengths and shades of black (my wife would probably physically hurt me for that last one).

    – Christian

    Jan 22, 2013 at 15:07

  • 217

    You had better hope you have an even number of socks, otherwise you are going to be folding socks for a long time…

    Jan 23, 2013 at 15:14

277

Case 1: All socks are identical (this is what I do in real life by the way).

Pick any two of them to make a pair. Constant time.

Case 2: There are a constant number of combinations (ownership, color, size, texture, etc.).

Use radix sort. This is only linear time since comparison is not required.

Case 3: The number of combinations is not known in advance (general case).

We have to do comparison to check whether two socks come in pair. Pick one of the O(n log n) comparison-based sorting algorithms.

However in real life when the number of socks is relatively small (constant), these theoretically optimal algorithms wouldn’t work well. It might take even more time than sequential search, which theoretically requires quadratic time.

12

  • 9

    > It might take even more time than sequential search, which requires quadratic time in theory. Yeah that is why I hate doing this, maybe I should throw away all my socks and start with case 1.

    – Nils

    Jan 21, 2013 at 12:19


  • 61

    the down side of having all identical socks is that they tend to age at different rates. So you still end up trying to match them based on how worn they are. (which is harder than simply matching by pattern)

    – SDC

    Jan 22, 2013 at 13:45

  • 130

    The problem with having 60 pairs of identical socks “because it makes pairing easier” is that it gives people the impression you work with computers.

    Apr 24, 2013 at 12:17


  • 14

    Case 1 is not constant time when there’s an operation involved, such as folding pairs together. In this case, it’s linear time with the smallest constant factor (the proof of which is left as an exercise for the reader). One can’t possibly take the same time folding one pair and a bucket full of socks. However, it scales linearly. By Amdahl’s law, it has unlimited speedup, ignoring overhead. By Gustafson’s law, you can fold as many pairs as it takes to fold one pair given enough workers (the amount of which is left as an exercise for the reader), ignoring overhead.

    – acelent

    Sep 4, 2013 at 13:00

  • 10

    @PauloMadeira The sorting is constant time – you just take the pile and put it in your drawer. The only operation in this case is actually putting the socks on your feet which is also constant. Performance is gained by deferred execution of the sock wearing, possibly with some sacrifice in space (consumed space of non-folded socks is larger than folded). I argue that this is worth it; I usually lose this argument with my wife.

    – Travis

    Sep 12, 2013 at 19:59