The substring() Method in JDK 6 and JDK 7

The substring(int beginIndex, int endIndex) method in JDK 6 and JDK 7 are different. Knowing the difference can help you better use them. For simplicity reasons, in the following substring() represent the substring(int beginIndex, int endIndex) method.

1. What substring() does?

The substring(int beginIndex, int endIndex) method returns a string that starts with beginIndex and ends with endIndex-1.

String x = "abcdef";
x = x.substring(1,3);
System.out.println(x);

Output:

bc

2. What happens when substring() is called?

You may know that because x is immutable, when x is assigned with the result of x.substring(1,3), it points to a totally new string like the following:

string-immutability

However, this diagram is not exactly right or it represents what really happens in the heap. What really happens when substring() is called is different between JDK 6 and JDK 7.

3. substring() in JDK 6

String is supported by a char array. In JDK 6, the String class contains 3 fields: char value[], int offset, int count. They are used to store real character array, the first index of the array, the number of characters in the String.

When the substring() method is called, it creates a new string, but the string's value still points to the same array in the heap. The difference between the two Strings is their count and offset values.

string-substring-jdk6

The following code is simplified and only contains the key point for explain this problem.

//JDK 6
String(int offset, int count, char value[]) {
	this.value = value;
	this.offset = offset;
	this.count = count;
}
 
public String substring(int beginIndex, int endIndex) {
	//check boundary
	return  new String(offset + beginIndex, endIndex - beginIndex, value);
}

4. A problem caused by substring() in JDK 6

If you have a VERY long string, but you only need a small part each time by using substring(). This will cause a performance problem, since you need only a small part, you keep the whole thing. For JDK 6, the solution is using the following, which will make it point to a real sub string:

x = x.substring(x, y) + ""

5. substring() in JDK 7

This is improved in JDK 7. In JDK 7, the substring() method actually create a new array in the heap.

string-substring-jdk7

//JDK 7
public String(char value[], int offset, int count) {
	//check boundary
	this.value = Arrays.copyOfRange(value, offset, offset + count);
}
 
public String substring(int beginIndex, int endIndex) {
	//check boundary
	int subLen = endIndex - beginIndex;
	return new String(value, beginIndex, subLen);
}

Top 10 questions about Java String.

References:
1. Changes to substring
2. Java 6 vs Java 7 when implementation matters

Category >> Basics  
If you want to post code, please put the code inside <pre> and </pre> tags.
  • http://mindprod.com roedygreen

    I would say “different” not “improved”. If you are plucking many small substrings from a big string that say represents file contents (my most common case) you will use more RAM with the JDK 7 method than with the JDK 6. If you need to keep the big string around JDK 6 is better. If you want to avoid pinning big string in RAM the JDK 7 method is better. I gather it would not have been practical to make it a compile time option since this decision is so pervasive.

  • JavaGuy

    Very nice and easy to understand article :)
    Found another nice article that explains the memory leak possibility well in Java 6

    http://javaterritory.blogspot.com/2014/09/how-strings-substring-function-can.html

  • 智 陶

    In jdk6,we also have the following code

    public String(String original) {

    int size = original.count;

    char[] originalValue = original.value;

    char[] v;

    if (originalValue.length > size) {

    // The array representing the String is bigger than the new

    // String itself. Perhaps this constructor is being called

    // in order to trim the baggage, so make a copy of the array.

    int off = original.offset;

    v = Arrays.copyOfRange(originalValue, off, off+size);

    } else {

    // The array representing the String is the same

    // size as the String, so no point in making a copy.

    v = originalValue;

    }

    this.offset = 0;

    this.count = size;

    this.value = v;

    }

    it also can solve in JDk6 and u can do it by intern() method too.
    correct me if i am wrong

  • Chintan Mohan Rohila

    WRT to Oracle Java 7, the diagram describing String objects in Java 7 needs to be updated as ‘count’ and ‘offset’ are no longer String class attributes. However, IBM Java 7 still has old implementation of not making new copy of ‘value’ in substring object.

  • c094728

    I didn’t believe you at first so looked at the source and you are quite right. There is code in constructor that makes a copy if the begin & end index don’t point at the full buffer.

  • c094728

    It sounds too me like the JDK7 substring will be very bad for performance when you have a very long string (say > 20k) and you want to include most of it in the new string. Now it has to copy all of that data whereas in JDK6 it took no more time than creating a small substring of 2 characters since it keeps the reference to the original string and only adds pointers to the substring. Very inefficient to copy this data. Does trim() in JDK7 copy the entire string? For example if you are parsing a large XML file, substring is very efficient in JDK6 because the data is not moved, just split up into fields that refrence the original file.

  • liam

    In JDK6 the old char array is still referenced by the new String.So only the old String object would be be garbage collected.Not the old char array.

    Just as author said ,”If you have a VERY long string, but you only need a small part each time by using substring().”

    Terrible situation.

  • naresh

    Imagine a situation where memory is very critical, once x is changed to hold, smaller array, i do not want to hold up memory and can garbage collect it. It will reduce memory leaks.

  • Kim

    I agree with you. Once you knew about the jdk 6 implementation it was quite alright.

  • Green

    very helpful for me ! thanks!

  • xiangxm

    i just don’t understand that why “The memory leak problem caused by substring() in JDK 6 is solved in JDK 7 ” ?? 3Q

  • xiangxm

    The memory leak problem caused by substring() in JDK 6 is solved in JDK 7, because new string will not point to the existing old character array.??why??

  • http://darkhogg.es/ Darkhogg

    x = x.substring(x, y) + “”

    Do *NOT* use that code. That line of code is equivalent to:

    StringBuilder sb = new StringBuilder();
    sb.append(x.substring(x, y));
    sb.append(“”);
    x = sb.toString();

    Instead, do:

    x = new String(x.substring(x, y));

    You save one object reference and a bit of time.

    I personally don’t like the new behaviour as it can be slow and waste memory in certain situations where the old behaviour did just fine.

  • simple_plan

    wonderful

  • Forest

    very good

  • skydiver

    Interesting, it seems that this was also incorporated in JDK 1.6 b14?

    public String(char value[], int offset, int count) {
    …..
    this.value = Arrays.copyOfRange(value, offset, offset+count);

    }

  • ryanlr

    You are welcome:)

  • Yuan.Ming

    Thanks………..

  • ryanlr

    Hi Peter,

    Thanks a lot for your comment. The memory leak problem caused by substring() in JDK 6 is solved in JDK 7, because new string will not point to the existing old character array.

    The old string(with old char array) will be garbage collected if there is no other references pointing to it.

    Thanks.

  • Peter Jerald

    Hi,

    Recently I heard that substring() method causes memory leak issues.

    http://bugs.sun.com/view_bug.do?bug_id=4513622

    whether this one resolved in new version of JDK?

    Added to this, another question

    As you said in the above article, In JDK 7 the substring method creates new object, so what will have happen to old string this will retain in memory?

    Thanks and Regards,
    Peter Jerald