Tuesday, December 09, 2014

Catastrophic Regular Expression

Today I was reading about catastrophic regular expression in java. Succinctly put, most programming language that includes regular expression support with greedy matching fail miserably against the following example:

Matcher: (a+a+) b
Sample string: aaaaa

The primary reason for this is matching logic tries different permutation as part of its backtracking when attempting a match. So for the aforementioned example, something to this effect takes place:

aaaaa, X => Fail
aaaa, a  => Fail backtrack
aaaa, aa => Fail backtrack
aaaa, a, aa => Fail
aaaa, aaa => Fail
aaaa, aa, a => Fail
aaaa, a, aa => Fail 

And so on.

You can see the various intermediate steps as part of this regex debugger.

One way to overcome this is by using possessive quantifier that does not do any backtracking. To explain further, Java supports three different types of quantifiers, viz., Greedy, Reluctant and Possessive quantifiers.

Greedy quantifier [syntax *]

Greedy quantifier forces the matcher to read in the entire input string before attempting the first match. In case if the first match attempts fails the matcher backs off by one character as seen in the example above and repeats the whole process.

Matcher: .*aaa
Sample String: aaaaaa

By definition greedy quantifier matches the entire string.


Reluctant Quantifier [syntax *?]

Reluctant quantifier takes an orthogonal strategy. It tries to match every character while searching for a match.

Matcher: .*?aaa
Sample String: aaaaaa

By definition reluctant quantifier would produce two matches, viz., aaa in {1, 3} and {4, 6}

Possessive Quantifier [syntax *+]

Possessive quantifier also takes in the entire input string but it tries only once for a match. Unlike greedy quantifiers, possessive quantifiers doesn't have a backtracking strategy.

Matcher: .*+aaa
Sample String: aaaaaa

No match is produced.

Tuesday, January 21, 2014

Python Easter Egg

>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

Monday, October 28, 2013

Visitor Pattern

Visitor Pattern

Visitor pattern is used for adopting towards open-closed principle. Java supports single dispatch, i.e., overloads methods correctly when concrete implementation of one parameter is involved. But visitor pattern primarily relies on multiple dispatch (e.g., double dispatch if two parameters are involved) that dispatches a function call to different concrete functions depending on the runtime types of objects involved in the call. Double dispatch is ability to overload methods depending on the concrete types of the parameter. For example, if you want to achieve something as follows [1]. Please note here that the cartesian product of figures and printers need to established:

class Client {

 /** Prints all figures on each of the printers. */
 void printAllEverywhere( Figure[] figures, Printer[] printers ) {
 for ( int i = 0; i < figures.length; i++ ) {
 Figure figure = figures[ i ];
 for ( int j = 0; j < printers.length; j++ ) {
  Printer printer = printers[ j ];

  figure.printOn( printer ); 
  // must work for any printer or figure !
 }
 }
 }
 }
 
 interface Figure {
 void printOn( Printer printer );
 }
 interface Printer {
 void printCircle( Circle circle );
 void printRectangle( Rectangle rectangle );
 }
 
 
 class InkjetPrinter implements Printer {
 public void printCircle( Circle circle ) {
 // ... rasterizing logic for inkjet printing of circles here ...
 System.out.println( "Inkjet printer prints a cirlce." );
 }
 public void printRectangle( Rectangle rectangle ) {
 // ... rasterizing logic for inkjet printing of rectangles here ...
 System.out.println( "Inkjet printer prints a rectangle." );
 }
 }
 class PostscriptPrinter implements Printer {
 public void printCircle( Circle circle ) {
 // ... postscript preprocessing logic for circles here ...
 System.out.println( "PostScript printer prints a cirlce." );
 }
 public void printRectangle( Rectangle rectangle ) {
 // ... postscript preprocessing logic for rectangles here ...
 System.out.println( "PostScript printer prints a rectangle." );
 }
 }
 
 
  
 class Circle implements Figure {
 public void printOn( Printer printer ) {
 printer.printCircle( this ); // <-- the "trick" !
 }
 }
 class Rectangle implements Figure {
 public void printOn( Printer printer ) {
 printer.printRectangle( this );
 }
 }

In languages that support such as Nice, multimethods[1], this is much easy as it's possible to overload based on the type of the arguments.


 abstract class Figure {}
  abstract class Printer {}

  class Circle extends Figure {}
  class Rectangle extends Figure {}

  class InkjetPrinter extends Printer {}
  class PostscriptPrinter extends Printer {}

  void print(Printer printer, Figure shape);

  print(InkjetPrinter printer, Circle shape) {
 println("Inkjet printer prints a circle.");
  }

  print(PostscriptPrinter printer, Circle shape) {
 println("Postscript printer prints a circle.");
  }

  print(InkjetPrinter printer, Rectangle shape) {
 println("Inkjet printer prints a rectangle.");
  }

  print(PostscriptPrinter printer, Rectangle shape) {
 println("Postscript printer prints a rectangle.");
  }

  // Client code...
  void printAllEverywhere
 (Collection<Printer> printers,
 Collection<Figure> shapes)
  {
 printers.foreach(Printer p => shapes.foreach(Figure s => print(p,s)));
  }

  void main(String[] args)
  {
 printAllEverywhere
 ([ new PostscriptPrinter(), new InkjetPrinter() ],
 [ new Circle(), new Rectangle() ]);
  }


Open-closed principle dictates that behavior can be modified without altering source code. For example, change in source code implementation might have far reaching unintended changes such as changing test code, unit tests, code review, etc. Once an interface is finalized, the existing one is closed for modification, but a new one can be made to extend at least minimalistic behavior of the one that is closed.

This is hard to explain in a non-pictorial manner. But the essential players are:

1. Element is a class that has accept(Visitor v) method.
2. Visitor is a class that has visit(Element e) method.

Use visitor pattern for the following:

a. when many unrelated operations need to be performed on an object structure.
b. when the classes defining the object structure doesn't change often. You want to add/remove operations that are performed on it.
c. Object contains many sub-objects with varying interface.


Reference:

[1] http://c2.com/cgi/wiki?DoubleDispatchExample

Labels: , ,

Wednesday, June 26, 2013

import copy

# Print all possible encodings of a positive integer numeric value.
# The encodings are defined as follows: 11 -> aa or ord(96 + 11) -> k.

def print_encodings(permuted_list):
    string_value = ""
    for element in permuted_list:
        string_value += chr(96 + int(element))
    print string_value

def encoding_permutation(input, idx, permuted_list):
    if (len(input) == idx):
        print_encodings(permuted_list)
        return
    if len(input) - idx >= 2:
        one_gram_list = copy.deepcopy(permuted_list)
        one_gram_list.append(input[idx])
        bi_gram_list =  copy.deepcopy(permuted_list)
        bi_gram_list.append(input[idx] + input[idx + 1])
        encoding_permutation(input, idx + 1, one_gram_list)
        encoding_permutation(input, idx + 2, bi_gram_list)
    else:
        permuted_list.append(input[idx])
        encoding_permutation(input, idx + 1, permuted_list)

encoding_permutation("1111", 0, [])

Labels: ,

Thursday, May 23, 2013

I stumbled upon the Dutch National Flag problem posed by Djikstra today. It's a very interesting problem. The elegant solution in the wikipedia implements over one iteration of quicksort operating with two pivots. It's an O(n) solution.

Monday, April 11, 2011

Sudoku Solver - Pass 1:

package personal.puzzles;

import java.util.HashSet;
import java.util.Set;

public class SudokuSolver {
Grid grid = new Grid();
public SudokuSolver(int[][] state) {
for (int i = 0; i < 9; ++i) {
for (int j = 0; j < 9; ++j) {
grid.setValue(new Coordinate(i, j, state[i][j]));
}
}
}

public boolean isInConsistentState() {
return grid.checkState();
}

public int[][] solveSudoku() {
getConsistentAssignment(grid);
return grid.getSudokuBoard();
}

private void getConsistentAssignment(Grid grid) {
while (grid.isNextUnassigned()) {
Coordinate next = grid.getNextUnassigned();
for (int i = 1; i <= 9; ++i) {
next.setValue(i);
grid.setValue(next);
if (grid.checkState()) {
getConsistentAssignment(grid);
return;
} else {
next.setValue(-1);
grid.setValue(next);
}
}
}
return;
}

public static class Grid {
int[][] x = new int[9][9];
public void setValue(Coordinate coordinate) {
x[coordinate.getX()][coordinate.getY()] = coordinate.getValue();
}
public int getValue(int i, int j) {
return x[i][j];
}
public boolean isNextUnassigned() {
for (int i = 0; i < 9; ++i)
for(int j = 0; j < 9; ++j)
if (x[i][j] == -1) {
return true;
}
return false;
}
public void print() {
for (int i = 0; i < 9; ++i) {
for (int j = 0; j < 9; ++j) {
System.out.print(x[i][j] + " ");
}
System.out.println();
}
}
public Coordinate getNextUnassigned() {
for (int i = 0; i < 9; ++i)
for(int j = 0; j < 9; ++j)
if (x[i][j] == -1) {
return new Coordinate(i, j, -1);
}
return null;
}
public boolean checkState() {
return checkColumns() && checkRows() && checkSubgrids();
}
private boolean checkColumns() {
for (int i = 0; i < 9; ++i) {
Set<Integer> columnValues = new HashSet<Integer>();
for (int j = 0; j < 9; ++j) {
if (columnValues.contains(x[i][j]))
return false;
if (x[i][j] != -1) columnValues.add(x[i][j]);
}
}
return true;
}

private boolean checkRows() {
for (int i = 0; i < 9; ++i) {
Set<Integer> rowValues = new HashSet<Integer>();
for (int j = 0; j < 9; ++j) {
if (rowValues.contains(x[j][i]))
return false;
if (x[j][i] != -1) rowValues.add(x[j][i]);
}
}
return true;
}

private boolean checkSubgrids() {
for (int i = 0; i < 9; ++i)
for (int j = 0; j < 9; ++j)
for (int k = 0; k < 9; ++k)
for (int l = 0; l < 9; ++l) {
if (Math.abs(i - k) == 2 && Math.abs(j - l) == 2 && i % 3 == 0 && j % 3 == 0) {

if (!checkForBlock(i, j, k, l))
return false;
}
}

return true;
}
private boolean checkForBlock(int x1, int y1, int x2, int y2) {
Set<Integer> blockValues = new HashSet<Integer>();;
for (int i = x1; i <= x2; ++i) {
for (int j = y1; j <= y2; ++j) {
if (blockValues.contains(x[i][j])) {
return false;
}
if (x[i][j] != -1) blockValues.add(x[i][j]);
}
}
return true;
}
public int[][] getSudokuBoard() {
return x;
}
}

public static class Coordinate {
int x, y, value;
Coordinate(int x, int y, int value) {
this.x = x;
this.y = y;
this.value = value;
}
public void setValue(int value) {
this.value = value;
}
public int getX() {
return x;
}
public int getY() {
return y;
}
public int getValue() {
return value;
}
}

public static void main(String[] args) {
int x[][] = {
{ -1, 3, 4, 6, 7, 8, 9, 1, 2},
{6, 7, 2, 1, 9, -1, 3, 4, 8},
{1, 9, 8, 3, 4, 2, 5, -1, 7},
{8, -1, 9, 7, 6, 1, 4, 2, 3},
{4, 2, 6, -1, 5, 3, -1, 9, 1},
{7, 1, 3, 9, 2, 4, 8, 5, 6},
{9, -1, 1, 5, 3, -1, 2, -1, 4},
{2, 8, 7, 4, 1, 9, 6, 3, 5},
{3, -1, 5, 2, -1, 6, 1, 7, 9},
};
SudokuSolver solver = new SudokuSolver(x);
int y[][] = solver.solveSudoku();
for (int i = 0; i < 9; ++i) {
for (int j = 0; j < 9; ++j) {
System.out.print(y[i][j] + " ");
}
System.out.println();
}
}
}

Saturday, April 09, 2011

Sort an integer array in such a manner that upon concatenating the elements it the result yields maximum value

package personal.puzzles;

import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;

/**
* Max value array sorter
*
* @author madhuc
*
*/
public class MaxValueArraySorter {

public static int getMaxValueFromArray(int[] input) {

List<String> inputAsString = new ArrayList<String>(input.length);
for (int i = 0; i < input.length; ++i) {
inputAsString.add(String.valueOf(input[i]));
}

Collections.sort(inputAsString,
Collections.reverseOrder(new MyComparator()));

StringBuilder outputBuilder = new StringBuilder();
for (int i = 0; i < inputAsString.size(); ++i) {
outputBuilder.append(inputAsString.get(i));
}
return Integer.valueOf(outputBuilder.toString());
}

public static class MyComparator implements Comparator<String> {
public int compare(String paramT1, String paramT2) {
if (paramT1.length() <= paramT2.length()) {
return findLargest(paramT1, paramT2);
} else {
return -1 * findLargest(paramT2, paramT1);
}
}

private int findLargest(String t1, String t2) {
for (int i = 0; i < t1.length(); ++i) {
if (t1.charAt(i) < t2.charAt(i)) {
return -1;
} else if (t1.charAt(i) > t2.charAt(i)) {
return 1;
}
}
return 1;
}
}

public static void main(String[] args) {
int[] test = {989, 91};
System.out.println(MaxValueArraySorter.getMaxValueFromArray(test));
}
}