# LeetCode – Repeated DNA Sequences (Java)

Problem

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example, given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", return: ["AAAAACCCCC", "CCCCCAAAAA"].

Java Solution

The key to solve this problem is that each of the 4 nucleotides can be stored in 2 bits. So the 10-letter-long sequence can be converted to 20-bits-long integer. The following is a Java solution. You may use an example to manually execute the program and see how it works.

```public List<String> findRepeatedDnaSequences(String s) { List<String> result = new ArrayList<>(); if(s==null||s.length()<10){ return result; }   HashMap<Character, Integer> dict = new HashMap<>(); dict.put('A', 0); dict.put('C', 1); dict.put('G', 2); dict.put('T', 3);   int hash=0; int mask = (1<<20) -1;   HashSet<Integer> added = new HashSet<>(); HashSet<Integer> temp = new HashSet<>();   for(int i=0; i<s.length(); i++){ hash = (hash<<2) + dict.get(s.charAt(i));   if(i>=9){ hash&=mask; if(temp.contains(hash) && !added.contains(hash)){ result.add(s.substring(i-9, i+1)); added.add(hash); }   temp.add(hash); } }   return result; }```
Category >> Algorithms >> Interview >> Java
If you want someone to read your code, please put the code inside <pre><code> and </code></pre> tags. For example:
```<pre><code>
String foo = "bar";
</code></pre>
```
• Yang Delia
• Darewreck

why do you have a temp and an added hashset. If your calculating the hashcode, shouldn’t you just have one set that contains all the seen hashcode to find duplicates?

• Darewreck

The hashcode sometimes will give you the same value for sequences that are not valid. Example

ACCCCTGAGG
CTGTTCGTTG

Both return hashCode: 1406448045

In java at least. So you can’t rely on the under the hood java implementation of hashcode unless you implement your own version. In the code, they implement it’s own hashcode for 20 bits.

• Jerome Liu

You may need more memory for 10 letter string.

• Salil Surendran

Why do you need to generate your own hashcode? The String class has it’s own hashCode method that returns a unique hash for each unique string. So if you just take each 10 letter string and check if it exists in the Set and if so then add it to the list, wouldn’t that work?