Codility Genomic Range Query

Posted on October 19, 2013

Last updated on October 18, 2013

In this article I will provide a solution to the Genomic Range Query sample Codility problem. The problem description is long and copyrighted, so have a look at the link for the detailed problem description, and try to attempt it first yourself.

The optimal idea is to keep a prefix sum of the number of occurences of each letter from the set \([G,C,T,A]\) for every position in the target string. Then, to evalute the minimal nucleotide between indices \((a,b)\), we can easily compute the total number of occurences of each of the nucleotides in \(O(1)\) time, and pick the smallest one. This leads to a total running time of \(O(N+M)\). The following Java code scores 100/100 on Codility:

public static int[] genome(String S, int[] P, int[] Q) {
   int len = S.length();
   int[][] arr = new int[len][4];
   int[] result = new int[P.length];
   
   for(int i = 0; i < len; i++){
     char c = S.charAt(i);
     if(c == 'A') arr[i][0] = 1;
     if(c == 'C') arr[i][1] = 1;
     if(c == 'G') arr[i][2] = 1;
     if(c == 'T') arr[i][3] = 1;
   }
   // compute prefixes
   for(int i = 1; i < len; i++){
     for(int j = 0; j < 4; j++){
       arr[i][j] += arr[i-1][j];
     }
   }	
   
   for(int i = 0; i < P.length; i++){
     int x = P[i];
     int y = Q[i];
     
     for(int a = 0; a < 4; a++){
       int sub = 0;
       if(x-1 >= 0) sub = arr[x-1][a];
       if(arr[y][a] - sub > 0){
         result[i] = a+1;
         break;
       }
     }
     
   }
   return result;
 }
Markdown SHA1: 2981a205ffcaffd4485adda81f39ccb56cbe77b6