Sunday, March 11, 2012

Calculate log2() Using sqrt()

Problem: Calculate log2()  by only using sqrt().

Solution: The key here is you need to know log2(1+x) = x/ln2 = x/0.69 when x is very close to 1. Therefore the algorithm first use sqrt() to reduce x to be very close to 1 then we can use the formular.
double log2(double x)
{
   assert(x>0);

   double res = 1.0;
   while(x-1 > threshold)
   {
      x = sqrt(x);
      res = res *2;

   }

   return res * x/0.69;

}

Find the Minimum Set of Numbers That Sum to at least K.

Problem: Given an array of integers, find a smallest subset (with minimum numbers of integers) that sum to at least K. For example, a[] = {2,-5,7,8,12, 4} and K = 13, then one of the minimum set can be {8, 12}.

Solution: One solution is first sort the array, then begin from the max integer, calculate the cumulative sum until the sum is no less than K. This solution (depending on what kind of sort algorithms you use) usually cost O(nlogn). Here we introduce a selection based algorithm which on average costs O(n) but costs O(n^2) in the worst case. The approach is similar to kth order-statistics.

  • Randomly choose a pivot from the current array and partition the array into two parts p1 and p2p1 contains integers which are smaller than the pivot; p2 contains integers which are larger than or equal to the pivot.
  • Check if pivot + sum(p2) == K, if so, we already find the answer; if not, then if pivot + sum(p2) > K but sum(p2) <= K, we also find the answer; if  sum(p2) > K, we need to do further partition, but we just need to further partition p2;  if pivot + sum(p2) < K,  we need to do further partition, but we need to partition p1 instead and the target value K is updated as K - pivot - sum(p2).
The code is as follow:
void find_min_set(int *a, int start, int end, int target)
{

    /* generate secret number: */
    int s, e, k;

    s = start;
    e = end;
    k = target;

    int sum;

    while(1)
    {
       srand ( time(NULL) );
       int idx = ((int)rand()) % (e-s + 1) + s;
       
       /* partition and sum */
       int tmp = a[idx];
       a[idx] = a[e];
       a[e] = tmp;

       sum = a[e];
       int pos = s;
       for(int i = s; i < e; i++)
       {
          if(a[i] < a[e])
          {
             if(pos != i)
             {
                tmp = a[pos];
                a[pos] = a[i];
                a[i] = tmp;
             }
            pos++;
          }
          else sum+= a[i];

       }

      tmp = a[pos];
      a[pos] = a[e];
      a[e] = tmp;

      if(sum == k)
      {
        cout << "found " << endl;
        return;
      }
      else if(sum > k)
      {
         if(sum - a[pos] < k)
         {
            cout << "found " << endl;
            return;
         }
         s = pos + 1;
         sum = 0;

      }
      else
      {
        k = k - sum;
        sum = 0;
        e = pos - 1;
      }
    }

}

                   

Wednesday, March 7, 2012

Calculate Number of Different Ways to Climb a Stairway with Constant Space and Linear Time

Problem: There are three ways to climb a stair, you can climb 1 step, 2 steps or 3 steps each time. Given a n step stairway, how many of ways can you climb up to the top?

Solution: As known, this is a simple DP problem.  If we can maintain an array dp[] with dp[i] representing the different ways to climb up i steps, we can get the final result in O(N). However, we need O(N) space as well. But actually, since dp[i] = dp[i-1] + dp[i-2] + dp[i-3], we don't need O(N) space, instead, we just need a cache with size 3. Then we can have a space complexity O(1) algorithm:

int climb_stair(int n)
{
  int cache[3];
  cache[0] = 1; /*ways to climb 0 step */
  cache[1] = 1; /*ways to climb 1 step */
  cach2[2] = 2; /*ways to climb 2 steps */

  if(n<=2) return cache[n];

  for(int i=3; i<=n; i++)
  {
     cache[i%3] = cache[0]+cache[1]+cache[2];
  }

  return cache[i%3];

}

Similarly, we can have a O(1) space algorithm for Fibonacci numbers.

User Bin Crash problem

Problem: This is a problem from facebook buzz. The problem is that a plane is gong to crash,  you need to throw out W lbs of freight to survive the crash. Each type of freight fi is associated with a weight wi and price pi. If we throw one unit of fi out, we lost pi but the plane becomes lighter by wi. Our mission is to throw out minmum freights measured by their aggregate price and avoid the crash, which means the total weight of those freights should be larger than W.

Solution: The problem is very similar to the 0-1 knapsack. We can also use similar strategy to solve it. The detail is as follow:

  • Basically, we need first sort different types of freight by their weight. 
  • Then we will need to fill an array dp[n][W+1] where n is the number of different types of freights. dp[i][j] represents the minmum cost to throw out at least j lbs freight and we can throw up to ith type of freights (ranked by their unit weight and from light to heavy). 
  • To fill dp[i][j], we need first select the most cost/efficient type of freight among the i types of freights. Cost/efficient is measured by the unit weight of the freight divided by its unit price. Say the most cost/efficient type of freight among the i types of freights is kth type of freights, we have if j <= kth type's unit weight, dp[i][j] = min(dp[i-1][j], kth type's unit price); Otherwise,  dp[i][j] = min(dp[i-1][j], dp[i][j-kth type's unit weight] + kth type's unit price). We don't need to use greedy method, just do regular dp. dp[i][j] = min(dp[i-1][j], dp[i][j-ith type's unit weight] + ith type's unit price).
  • The time complexity of this approach is O(nW).

About Disjoint-Set Forests

Disjoint-Set Forests is very useful to solve the problems such as connected component in a graph. For example,  we have several set {a, b, c}, {d, e}, {f}. The elements in set is vertices that connected to each other. If we have an edge e(b,f), then we can merge the set {a,b,c} and {f} since now they are connected.

The common operations in disjoint-set are Make-Set, Union and Find-Set. Here Union is the operation that merge two sets and Find-set is that given a element, find out which set this element is in. To model the disjoint-set problem. We can use a forest such that each tree is a set. The representative of a set is the root of that tree. And each element in the tree has a parent pointer. If we want to do Union, we can just put one tree as the sub tree of another. If we want to do Find-Set, we just start from that element and follow its parent pointer until we reach an element whose parent is itself.

However, to make the operations efficiently, two optimization can be made:

  1. Union by rank: In order to keep Find-Set efficient, we need to keep the depth of the tree small. In order to do that, for each tree, we can maintain a rank which stands for the largest depth of the tree. When we Union two trees, we make the tree with smaller rank as the subtree of the tree with bigger rank. Basically, we made the the parent pointer of the root of the smaller ranked tree to point to the root of the bigger ranked tree.
  2. Path compression: Still we want to further improve Find-Set, the idea is after we do a Find-Set, actually we had a path from root to the query element. Then we can modify all the element in this path to point their parent pointer to the root, which will shorten the path for future query.

Tuesday, March 6, 2012

Convert an Array into a Sorted Array with Minimum Cost

Problem: Given an array of positive integers, and you only have two operations : a) decrease an integer by k  with a cost of k; b) delete an integer with a cost of the value of that integer. Try to use these two operations to convert the array into a sorted array with minmum cost. For example, a[] = {5,4,7,3}, the optimal way is to decrease "5" by 1 and delete "3" with the total cost of 4 (1+3). The resulting sorted array is {4, 4, 7}.

Solution: Here I give a O(N^2) solution by using DP. The key data structure we need is dp[i] which records the minimum cost to covert a subarray a[0, i] into sorted array and the last element of the sorted array should be a[i].  For the example array a[] = {5,4,7,3}, dp[0] = 0, dp[1] = 1 (since we need to decrease "5" by 1), dp[2] = 1, dp[3] = 7. After we had filled up dp[], we need one more data structure: aggr[]. aggr[i] = a[0] + a[1] + ... + a[i]. Then the minmum cost to convert the whole array cost_min = min{dp[i] + aggr[N-1] - aggr[i]}. Here  "aggr[N-1] - aggr[i]" means the cost that we  delete all the elements with a subscript larger than i.

Then the key is to calculate dp[i]. There are several observations here:
  1. if a[i] >=  a[j] (j<i),  we can construct a sorted array ended at a[i] by deleting a[j+1], a[j+2], ... , a[i-1] and then append a[i]. The cost of the resulting sorted array is dp[j] + the cost of deleting a[j+1], a[j+2], ... , a[i-1]. 
  2. if a[i] < a[j], to have a[i] as the last element, we need to decrease a[j] by a[j] - a[i]. Moreover,  we may not stop at a[j] and we need to keep look backward until we found a a[m] such that a[m] <= a[i], then the cost of the resulting sorted array is dp[m+1] + ∑(a[k] - a[i]) where m<k<j
  3. To compute dp[i], we need to look at all the j such that  j<i, depending on a[j] and a[i], we need to calculate the cost based on step 1 or 2. Then the minmum cost we encounter is the value of dp[i].
The code is as follow:
int min_convert_cost(int a[], int n)
{
    int dp[n];
    int aggr[n];

    aggr[0] = a[0];
    for(int i=1; i<n; i++)
        aggr[i] = aggr[i-1] + a[i];

    dp[0] = 0;
    for(int i=1; i<n; i++)
    {
       dp[i] = INT_MAX;
       for(int j=i-1; j>=0; j--)
       {
          int cost_i = 0;

          if(a[i] >= a[j])
          {
             cost_i = dp[j] + aggr[i-1] - aggr[j];
          }
          else
          {
              cost_i += aggr[i-1] - aggr[j];
              while(a[j]>a[i] && j >=0)
              {
                cost_i += a[j]-a[i];
                j--;
              }
              
              cost_i += dp[j+1];

          }
          if(cost_i < dp[i])
               dp[i] = cost_i;

       }
    }

   int min = INT_MAX;
   for(int i=0; i<n; i++)
   {
      int cost_i = dp[i] + aggr[n-1] - aggr[i];
      if(cost_i<min) min = cost_i;
   }

   return min;
}



Monday, March 5, 2012

String Reduction

Problem:  A string that only contains three types of letters :"a", "b", "c". Our goal is to reduce the string. If two different letters are adjacent to each other, we can reduce the two letters into another letter. The rule is like this: if we encounter "ab" (or "ba"), we reduce to "c";   if we encounter "bc" (or "cb"), we reduce to "a";  if we encounter "ca" (or "ac"), we reduce to "b". If two adjacent letters are the same, we can't reduce them. We need to find an optimal way to reduce the string as much as possible.

For example, for string "acbb", if we try to reduce "ac"first, we get "bbb". But, if we reduce "cb" first, we get "aab" and we can further make "aab" => "ac" => "b". The latter approach gives us an optimal reduction.

Solution: Here what really matters is the numbers of letter "a", "b" and "c". Let us name them as num_a, num_b and num_c. If two of them are zero, we can't reduce that string. Otherwise, we can reduce the string into a string that has a length of 1 or 2. If num_anum_b and num_c are all even or odd,  we can reduce to a string with length 2; If not, we can reduce to a string with length 1.

Then how to do the reduction? The detail is as follow:

  1. if the string has a length of 3 and contains one "a", one "b" and one "c", whatever order of the three letter are in, we can only reduce the string into a string with length 2; if the string has a length of 2 and contains two different letters, we can only reduce the string into a string with length 1. Let us regard these two cases as base cases.
  2. for a general case, we have num_a "a" num_b  "b" and num_c  "c". After every reduction, the sum of num_anum_b and num_c will decrease by 1, since we substitue two letters with one letter.  
  3. Then at each round, which two adjacent letters we choose to reduce? We try to find an adjacent pair that contains a letter which has the highest count. For example, if now, we have 3 "a", 4 "b" and 6 "c" in the string, we choose an adjacent pair that contains "c"since num_c = max(num_anum_bnum_c) . Can we find such pair? definitely we can. If there are multiple such pairs, choose a random one. Then after we reduce this pair, num_c--, max(num_anum_bnum_c) may decrease by 1, remain unchanged, or increase by 1. However, since max(num_anum_bnum_c) is up-bounded by num_a + num_b num_c. num_a + num_b num_c is decreasing after every round, then max(num_anum_bnum_c) will also decrease if we look at a long wrong. Therefore, by using our strategy,  max(num_anum_bnum_c) will eventually decrease to 1, which means we are encounter the base cases in step 1.
  4. Then when the string will be reduced to a length of 1 and when to a length of 2? We observe that is num_anum_b and num_c are all even, then after one transformation, they will become all odd; similarly, if there are originally all odd, after one transformation, they will become all even. Then according to the analysis in step 3), we know at the end, the max(num_anum_bnum_c) will eventually decrease to 1. But, they should still be all odd at that time (since "1" is odd). Therefore, at the very end, we will have num_a = 1, num_b = 1 and num_c =1, which will  eventually lead to a string of length 2. It is easy to prove that if num_anum_b and num_c are  not all even or odd, the string will be reduced to length 1.
  5. if num_anum_b and num_c are  not all even or odd, there must be only two cases: a) two of the counters are odd and one is even b) two of the counters are even and one is odd.  For example, if num_b is odd, num_a and num_c are both even. The string actually will eventually be reduced to  "b".
  6. if num_anum_b and num_c are all even or odd, there might be multiple different final forms.