Writing a Data Interpreter

A data interpreter is used to interpret information that is being inserted into the database. An interpreter is typically used when the data being inserted is in a different format than the desired format of storing it into the database.

The interpreter is thus used to work on the data and restructure it to fit as per the business needs of the application. The interpreter does an in-database and in-transaction data transformation on the data being received by the database.

A data interpreter is set on a collection. This means any insert being attempted on a collection is first passed through the interpreter. The interpreter is expected to perform a data transformation and return the transformed data. The actual data then stored in the database collection is the transformed data output by the interpreter.

@DataInterpreter(name = "LogInterpreter")
public class LogInterpreter implements DataInterpretable {

    private static final String LOG_ENTRY_PATTERN = "^(\\S+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(\\S+) (\\S+) (\\S+)\" (\\d{3}) (\\S+)";
    private static final Pattern PATTERN = Pattern.compile(LOG_ENTRY_PATTERN);


    public JSONObject interpret(String s) {
        Matcher matcher = PATTERN.matcher(s);

        if(matcher.find()) {
            JSONObject jsonObject = new JSONObject();
            jsonObject.put("ip", matcher.group(1));
            jsonObject.put("client", matcher.group(2));
            jsonObject.put("user", matcher.group(3));
            jsonObject.put("time", matcher.group(4));
            jsonObject.put("method", matcher.group(5));
            jsonObject.put("request", matcher.group(6));
            jsonObject.put("protocol", matcher.group(7));
            jsonObject.put("resp_code", matcher.group(8));
            if(!matcher.group(9).equals("-")) {
                jsonObject.put("size", Integer.parseInt(matcher.group(9)));
            }
            return jsonObject;
        }

        return null;
    }
}

The above example shows a simple log interpreter. The interpreter is automatically invoked when a log entry is attempted to be inserted. The interpreter extracts key elements from each log entry like ip, client, user etc; produces a more usable JSON object from the same and returns the JSON object.

The collection would effectively store the JSON entry and not the raw log string.

An interpreter can be configured along with a watch service to pick up data from a folder. The configuration allows processing an entire file together or processing the file line by line. The interpreter is automatically invoked along with an insert operation on the collection from the folder / file watch service.