Longest Substring with At Most K Distinct Characters

Given a string, find the longest substring that contains only two unique characters. For example, given "abcbbbbcccbdddadacb", the longest substring that contains 2 unique character is "bcbbbbcccb".

1. Longest Substring Which Contains 2 Unique Characters

In this solution, a hashmap is used to track the unique elements in the map. When a third character is added to the map, the left pointer needs to move right.

You can use "abac" to walk through this solution.

 public int lengthOfLongestSubstringTwoDistinct(String s) { int max=0; HashMap map = new HashMap(); int start=0;   for(int i=0; i2){ max = Math.max(max, i-start);   while(map.size()>2){ char t = s.charAt(start); int count = map.get(t); if(count>1){ map.put(t, count-1); }else{ map.remove(t); } start++; } } }   max = Math.max(max, s.length()-start);   return max; }

Now if this question is extended to be "the longest substring that contains k unique characters", what should we do?

2. Solution for K Unique Characters

UPDATE ON 7/21/2016.

The following solution is corrected. Given "abcadcacacaca" and 3, it returns "cadcacacaca".

 public int lengthOfLongestSubstringKDistinct(String s, int k) { if(k==0 || s==null || s.length()==0) return 0;   if(s.length() map = new HashMap();   int maxLen=k; int left=0; for(int i=0; ik){ maxLen=Math.max(maxLen, i-left);   while(map.size()>k){   char fc = s.charAt(left); if(map.get(fc)==1){ map.remove(fc); }else{ map.put(fc, map.get(fc)-1); }   left++; } }   }   maxLen = Math.max(maxLen, s.length()-left);   return maxLen; }

Time is O(n).

Category >> Algorithms
If you want someone to read your code, please put the code inside <pre><code> and </code></pre> tags. For example:
<pre><code>
String foo = "bar";
</code></pre>
• Trung Vo

Can someone verify my code below please?

public static int lengthOfLongestSubstringTwoDistinct(String s){
int max = 0;
Set map = new HashSet();
for (int i=0, index=0; i 2){
index = i-1;
map.clear();
}
max = Math.max(max, i-index+1);
}
return max;
}

• William Kuo

O(kn)

public static int lengthOfLongestSubstringKDistinct(String s, int k) {
HashMap map = new HashMap();
int start = 0;
int max = 0;
int count = 0;
for (int i = 0 ; i < s.length() ; i++) {
map.put(s.charAt(i), i);
if (map.keySet().size() k) {
int minIdx = Integer.MAX_VALUE;
char cand = 0;
for (char c : map.keySet()) {
if (map.get(c) max)
max = count;
}

return max;
}

• rupalph

Can you explain what is the logic? I tried to run the code for the abcadcacacaca, it returns 11.

• Mike

public static int longestSubstring(String s) {

int i = 0, start = 0, count = 0;

Set set = new HashSet();

while(i2){

if(i-start>count) count = i-start;

set.clear();

i = ++start;

}else{

i++;

}

}

return count;

}

• Mike

I think mine it’s the simplest one..

public int lengthOfLongestSubstringKDistinct(String s) {
int i = 0, start = 0, count = 0;
Set characters = new HashSet();
while(i2){
characters.clear();
if(i-start > count){
count = i-1;
}
i = ++start;
}else{
i++;
}
}
return count;
}

• Cherry Zhao

This is an interesting question. I found this post has a detailed analysis

http://blog.gainlo.co/index.php/2016/04/12/find-the-longest-substring-with-k-unique-characters/

• Juan Melo

As Json zhang pointed, solution 3 is wrong the result of this:
is “abca” and it should be “cadcacacac”

Here’s an On^2 working piece i did:

public static String findLongest(String myString, int max) {
if (myString == null) {
return null;
}
int start = 0, end = 0;
for (int i = 0; i < myString.length(); i++) {
Map map = new HashMap();
for (int j = i; j max || j == myString.length() - 1) {
if (end - 1 - start max) {
map.clear();
}
break;
}
}
}
return myString.substring(start, end);
}

• Larry Okeke

Just create a data structure that acts as a string that doesnt allow more than k distinct elements to be added to it.

import java.util.*;

public class longest_substring_two {

public static void main(String[] args){

System.out.println("Length " + solution(args[0]));
}

public static String solution(String s){

data_structure ds = new data_structure();
int sequence = 0;
int longest = 0;
String solution = "";
for(int i = 0; i < s.length(); i++){

for(int j = i; j longest){

longest = sequence;
solution = ds.get();
}
ds.clear();
break;
}

}
}

return solution;

}

private static class data_structure{

/*
this data structure adds string, allowing only a specified number of duplicates

*/
ArrayList word = new ArrayList();

private ArrayList allowed = new ArrayList();
StringBuilder b = new StringBuilder();
public boolean insert(char c){

//this character is allowed just put it in
if(allowed.contains(c)){
b.append(c);
return true;
}
//this character is not allowed, if it can be allowed, add it
else if(allowed.size() < 2){
b.append(c);
return true;
}
//this character is not allowed and cannot be added.
else{

return false;

}
}

public int length(){
return word.size();
}

public void clear(){
allowed.clear();
word.clear();
b.setLength(0);
}

public String get(){

return b.toString();
}

}
}

• Shobhit Jaiswal

here is my solution

static String uniquecharSubs(String str, int k) {

Set set = new HashSet();

int max = 0;

StringBuilder sb = new StringBuilder();

char ch[] = str.toCharArray();

String newstr = “”;

for (int i = 0; i < ch.length; i++) {

set = new HashSet();

sb = new StringBuilder();

for (int j = i; set.size() = ch.length) {

return newstr;

} else {

if (set.size() > k) {

if (sb.length() > max) {

max = sb.length();

newstr = sb.toString();

}

}

sb.append(ch[j]);

}

}

}

return newstr;

}

• Pritika Mehta

import java.util.HashSet;
import java.util.TreeMap;

public class LCSWithTwoChar{
public static void main(String args[]){
String str = "pritika";
TreeMap map = new TreeMap();
int start =0;
int len =0;
int i=0;
int j =0,k;
char c1 =' ';
char c2 =' ';
for(k =0;k<str.length();k++){
char c= str.charAt(k);
//System.out.println("char is "+c+" c1= "+c1+" c2="+c2);
if(c1 == ' '){
c1 = c;
i = k;
len++;
start =k;
}else if(c2 == ' '){
c2 = c;
j = k;
len++;
}else if(c1 == c || c2 ==c){
if(c1 == c){
i = k;
len++;
while(k+1<str.length() && c1 == str.charAt(k+1)){
k++;
len++;
}

}else if(c2 == c){
j = k;
len++;
while(k+1<str.length() && c2 == str.charAt(k+1)){
k++;
len++;
}

}
}else if(c1 != c && c2 !=c){
map.put(k-start, new Index(start,k-1));
if(i<j){
c1 = c2;
c2 = c;
i =j;
j =k;
start = i;
len = k-i;
}else{
c2 = c;
j = k;
start = i;
len = k-i;
}
}
if(k == str.length()-1)
map.put(k-start+(start==0?0:1), new Index(start,k-1));
//System.out.println("char is "+c+" c1= "+c1+" c2="+c2);
//System.out.println(map.toString()+" start= "+start+" len is "+len+" i= "+i+" j= "+j+"n--------------------");
}

System.out.println(map.lastKey());
}

}

class Index{
int i;
int j;
Index(int i,int j){
this.i = i;
this.j = j;
}
public String toString(){
return i+" "+j;
}
}

• up23

Here’s in php for O(n) time assuming in_array() is constant time or O(n*k) if not.

function findSubstring(\$text, \$k = 2) {

if (strlen(\$text) < \$k) {
return \$text;
}

\$bestAns = "";
\$lastTwoUniqueChars = array();

\$ans = "";
for (\$i = 0; \$i < strlen(\$text); \$i ++ ) {
\$letter = \$text{\$i};

if (count(\$lastTwoUniqueChars) < \$k) {
if (!in_array(\$letter, \$lastTwoUniqueChars)) {
\$lastTwoUniqueChars[] = \$letter;
}
continue;
}

if (in_array(\$letter, \$lastTwoUniqueChars)) {
\$ans .= \$letter;
}
else {
// new letter from last k uniques
\$lastTwoUniqueChars = array();
\$ans = "";
for (\$j = 0; \$j strlen(\$bestAns)) {
\$bestAns = \$ans;
}
}

if (\$bestAns == "") {
return \$text;

}
else {
return \$bestAns;
}

}

echo findSubstring("a") . "nn";
echo findSubstring("ab") . "nn";
echo findSubstring("abcde") . "nn";

• ryanlr

You are right. I have changed the 3rd solution. Thanks!

• Dylan Wang

Does the 3rd code work?

Look through code, it seems that this line should be put into while loop, “char first = s.charAt(start);”

I think the code should be changed to

//move left cursor toward right, so that substring contains only k chars
while(map.size()>k)
{
char first = s.charAt(start);
int count = map.get(first);
if(count>1){
map.put(first,count-1);
}else{
map.remove(first);
}

start++;
}

• Satish

public static String unique2CharSubstring(String str) {
String result = “”;
int len = str.length();
HashMap map = new HashMap();
char[] c = str.toCharArray();
int right = 0, max = 0;
for (int left = 0; left < len; left++) {
while (right 2) {
left = Math.max(left, map.get(queue.peek()) + 1);
map.remove(queue.pop());
}
if (right – left > max) {
max = right – left;
result = str.substring(left, right + 1);
}
right++;
}
}
return result;
}

I also tried to execute first method after I failed to understand the idea behind it, and it usually produces answer wit the same beginning as an input string.

• jason zhang

The solution 3 does not work. For example, this test case can’t pass:

The logic is here: we should not remove the first character(char first = s.charAt(start); ) which is a, but the first character which does not exist anymore. In the example, the character that should be removed is b

• Anony

Simple Java solution for 2 unique chars

import java.util.Arrays;

public class LgstUnq {

public static char [] longestString(char [] tab) {

if(tab == null || tab.length <= 2) {

return tab;

}

int max = 2;

int newMax = 2;

char [] cMax = new char[tab.length];

char [] lMax = cMax;

char x = tab[0];

char y = tab[1];

cMax[0] = x;

cMax[1] = y;

for(int i = 2, j = 2; i max) {

max = newMax;

lMax = Arrays.copyOf(cMax, i+1);

newMax = 0;

}

x = c;

cMax = new char[tab.length – i + 1]; // including previous char

j = 0;

c = tab[–i];

}

cMax[j++] = c;

newMax++;

}

return lMax;

}

public static void main(String [] args) {

String v = args[0];

char [] arr = v.toCharArray();

System.out.println(longestString(arr));

}

}

• neer1304

For string “aabaaddaa” it gives output as “aabaa” whereas the correct output should be “aaddaa”.

• guest

O(n) idea. Using hashmap(check if it contain the word and also pointer to the linked list node with this word) and also a linked list to keep K element. Head of the linkedlist is the smallest index of the k element. While doing update index of existing element in hashmap and linked list we can also do o(1) operation by insert the node to the tail. And the update the hashmap. (Like LRU cache concept) but each operation will only be o(1) so then the total is o(n)

• ryanlr

Right, the bad solution is deleted now.

• Tiago Pinho

The next one I think works:

static public String nonRepeated(String input){

char[] charAInput = input.toCharArray();

String aux = “”;

String resultSubStr = “”;

char otherChar;

if(input.length() <= 2)

return input;

for(int i =0; i < charAInput.length; i++ ){

String removedOcurrences = aux.replace("" + charAInput[i], "");

String removedOtherChar = "";

if (!removedOcurrences.equals("")){

otherChar = removedOcurrences.charAt(0);

removedOtherChar = removedOcurrences.replace("" + otherChar, "");
}

if(removedOcurrences.equals("") || removedOtherChar.equals("")){

aux = aux.concat("" + charAInput[i]);

}
else
aux = "" + removedOtherChar + charAInput[i];

if(resultSubStr.length() < aux.length())
resultSubStr = aux;
}

return resultSubStr;
}

• Guest

The next one I think will work:

static public String nonRepeated(String input){

char[] charAInput = input.toCharArray();

String aux = “”;

String resultSubStr = “”;

char otherChar;

if(input.length() <= 2)

return input;

for(int i =0; i < charAInput.length; i++ ){

String removedOcurrences = aux.replace("" + charAInput[i], "");

String removedOtherChar = "";

if (!removedOcurrences.equals("")){

otherChar = removedOcurrences.charAt(0);

removedOtherChar = removedOcurrences.replace("" + otherChar, "");

}

if(removedOcurrences.equals("") || removedOtherChar.equals("")){

aux = aux.concat("" + charAInput[i]);

}

else

aux = "" + removedOtherChar + charAInput[i];

if(resultSubStr.length() < aux.length())

resultSubStr = aux;

}

return resultSubStr;

}

• no-nested-loops

O(N) solution:

public static String subString(String s) {
int cstart = 0, lstart = 0, lastswap = 0, pos = 0, beststart = 0, bestlength = -1;
Character c = null, l = null;
for (Character curchar : s.toCharArray()) {
if (curchar != c) {
cstart += lstart – (lstart = cstart);
Character swapchar = l;
l = c;
if (curchar != (c = swapchar)) {
cstart = pos;
c = curchar;
lstart = lastswap;
}
lastswap = pos;
}
if (++pos – Math.min(lstart, cstart) > bestlength) {
bestlength = pos – (beststart = Math.min(lstart, cstart));
}
// System.out.printf(“p =%3d, s =%2s, c =%2s, l =%4s, cs =%3d, bs =%3d, ls =%3d, bs =%3d, bl =%3d%n”, pos, curchar, c, l, cstart, lastswap, lstart, beststart, bestlength);
}
return s.substring(beststart, beststart + bestlength);
}

• Jeffery yuan

Not work:

• Yi Wang

I think your scalable solution requires O(N*k) time, which may be not very efficient in terms of a large k. By using hash map for character counting, there is a solution with O(N) time.
http://blog.csdn.net/whuwangyi/article/details/42451289

• skra

Here is my solution in C#:

public sealed class Substring

{

public Substring(String input, Int32 count)

{

if (null == input)

{

throw new ArgumentNullException(“input”);

}

if (count > input.Length)

{

throw new ArgumentOutOfRangeException(“count”);

}

if (count < 0)

{

throw new ArgumentOutOfRangeException("count");

}

_count = count;

_input = input;

_chars = _input.ToCharArray();

}

private Nullable _startIndex;

private Nullable _length;

private class CharsContainer

{

private readonly Dictionary _items = new Dictionary();

{

if (_items.ContainsKey(ch))

{

_items[ch]++;

}

else

{

_items[ch] = 1;

}

}

internal void RemoveItem(Char ch)

{

if (_items.ContainsKey(ch))

{

_items[ch]–;

if (_items[ch] == 0)

{

_items.Remove(ch);

}

}

}

internal Int32 Length

{

get

{

return _items.Count;

}

}

}

private void TryStoreCurrentResult(Int32 ix, Int32 length)

{

if (_length.HasValue)

{

if (_length.Value < length)

{

_length = length;

_startIndex = ix;

}

}

else

{

_length = length;

_startIndex = ix;

}

}

public String Find()

{

Int32 endPos = -1;

Int32 startPos = 0;

CharsContainer charsContainer = new CharsContainer();

while (endPos < _chars.Length)

{

if (charsContainer.Length = _chars.Length)

{

break;

}

}

else

{

charsContainer.RemoveItem(_chars[startPos]);

startPos++;

}

Int32 currentLength = endPos – startPos + 1;

if (charsContainer.Length 0)

{

TryStoreCurrentResult(startPos, currentLength);

}

}

if (_startIndex.HasValue && _length.HasValue)

{

return _input.Substring(_startIndex.Value, _length.Value);

}

return String.Empty;

}

}

• Krzysztof Rajda

@jason it works for me. Make sure that you have following code at the end

if (charArray.length – currentStart > maxEnd – maxStart) {
maxStart = currentStart;
maxEnd = charArray.length;
}

return text.substring(maxStart, maxEnd);

• Krzysztof Rajda

I am attaching a solution that works with any k. Basically the trick is to not clear the whole map, but just remove one element.

private static String findLongestSubstring(String text) {
int maxStart = 0;
int maxEnd = 1;
int currentStart = 0;
Map lastOccurrenceMap = new HashMap();

char[] charArray = text.toCharArray();
for (int i = 0; i < charArray.length; i++) {
char c = charArray[i];
Integer lastOccurrenceOfC = lastOccurrenceMap.get(c);
if (lastOccurrenceOfC == null) {
// you can just change 2 to any number to solve k problem
if (lastOccurrenceMap.size() maxEnd – maxStart) {
maxStart = currentStart;
maxEnd = i;
}
int lastOccurrenceOfCharToRemove = charArray.length;
char charToRemove = ‘ ‘;

for (Map.Entry characterIntegerEntry : lastOccurrenceMap.entrySet()) {
if (characterIntegerEntry.getValue() maxEnd – maxStart) {
maxStart = currentStart;
maxEnd = charArray.length;
}

return text.substring(maxStart, maxEnd);
}

if you got the substring then you can directly get the position of the repeated character by using lastindexof ,i.e no need to make an auxiliary function to help the main.and then we can get the length of the string containing only a single character

• xiayu5945

private static String resolution(String str){

if(StringUtils.isEmpty(str))return str;

int first=0;int second=0;char firstChar= str.charAt(0) ; char secondChar = str.charAt(0);

int max=0;int endIndex=0;

for(int i=0;i<str.length();i++){

char temp = str.charAt(i);

if(secondChar == temp){

second++;

}else if(firstChar == temp){

first++;

}else{

firstChar = secondChar;

secondChar = temp;

if(first !=0 && max<first+second){

max=first+second;

endIndex = i;

}

first=second;

second=1;

}

}

if(first !=0 && max<first+second){

max=first+second;

endIndex = str.length();

}

return str.substring(endIndex-max,endIndex);

}

• Rang-ji Hu

Here is a rough version made by C#

static int FindMaxConsecutiveSubstring(string target, int tolerant)

{

int max = 0;

int threshold = 0;

if (target.Length > 0)

{

List map = new List();

char c = target[0];

threshold = 1;

int i = 1;

max = 1;

int tempCount = 1;

while (i < target.Length)

{

char temp = target[i];

if (!map.Contains(temp))

{

if (threshold max)

{

max = tempCount;

}

tempCount = 0;

i–;

char keep = target[i];

map = map.Where(a => a == keep).ToList();

while (i >= 1)

{

if (target[i – 1] == keep)

{

i–;

}

else

{

break;

}

}

}

}

else

{

tempCount++;

i++;

}

}

if (tempCount > max)

{

max = tempCount;

}

}

return max;

}

the second parameter is the number of unique characters.

• Pulkit

Why not just find out the top two maximum occurring characters. They would be in the answer.
This solution would scale for K unique characters.
i.e. sort the characters acc to the frequency and pick top K characters.

Now iterate over the array and check if we need to save that character or not.

• jason

logic is not correct.
For example, give an input “ab”, substring return “a”.