Categories
algorithm arrays language-agnostic

How to find repeating sequence of characters in a given array?

My problem is to find the repeating sequence of characters in the given array. simply, to identify the pattern in which the characters are appearing.

   .---.---.---.---.---.---.---.---.---.---.---.---.---.---.
1: | J | A | M | E | S | O | N | J | A | M | E | S | O | N |
'---'---'---'---'---'---'---'---'---'---'---'---'---'---'

   .---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.
2: | R | O | N | R | O | N | R | O | N | R | O | N | R | O | N |
'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'

   .---.---.---.---.---.---.---.---.---.---.---.---.
3: | S | H | A | M | I | L | S | H | A | M | I | L |
'---'---'---'---'---'---'---'---'---'---'---'---'

   .---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.---.
4: | C | A | R | P | E | N | T | E | R | C | A | R | P | E | N | T | E | R |
'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'---'


Example

Given the previous data, the result should be:

  1. "JAMESON"
  2. "RON"
  3. "SHAMIL"
  4. "CARPENTER"


Question

  • How to deal with this problem efficiently?

For your examples, my first approach would be to

  1. get the first character of the array (for your last example, that would be C)
  2. get the index of the next appearance of that character in the array (e.g. 9)
  3. if it is found, search for the next appearance of the substring between the two appearances of the character (in this case CARPENTER)
  4. if it is found, you’re done (and the result is this substring).

Of course, this works only for a very limited subset of possible arrays, where the same word is repeated over and over again, starting from the beginning, without stray characters in between, and its first character is not repeated within the word. But all your examples fall into this category – and I prefer the simplest solution which could possibly work 🙂

If the repeated word contains the first character multiple times (e.g. CACTUS), the algorithm can be extended to look for subsequent occurrences of that character too, not only the first one (so that it finds the whole repeated word, not only a substring of it).

Note that this extended algorithm would give a different result for your second example, namely RONRON instead of RON.