LeetCode – Repeated DNA Sequences (Java)
Problem
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example, given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT", return: ["AAAAACCCCC", "CCCCCAAAAA"].
Java Solution
The key to solve this problem is that each of the 4 nucleotides can be stored in 2 bits. So the 10-letter-long sequence can be converted to 20-bits-long integer. The following is a Java solution. You may use an example to manually execute the program and see how it works.
public List<String> findRepeatedDnaSequences(String s) { List<String> result = new ArrayList<>(); if(s==null||s.length()<10){ return result; } HashMap<Character, Integer> dict = new HashMap<>(); dict.put('A', 0); dict.put('C', 1); dict.put('G', 2); dict.put('T', 3); int hash=0; int mask = (1<<20) -1; HashSet<Integer> added = new HashSet<>(); HashSet<Integer> temp = new HashSet<>(); for(int i=0; i<s.length(); i++){ hash = (hash<<2) + dict.get(s.charAt(i)); if(i>=9){ hash&=mask; if(temp.contains(hash) && !added.contains(hash)){ result.add(s.substring(i-9, i+1)); added.add(hash); } temp.add(hash); } } return result; } |
<pre><code> String foo = "bar"; </code></pre>
-
Yang Delia
-
Darewreck
-
Darewreck
-
Jerome Liu
-
Salil Surendran