Code Learnings

Thursday, November 28, 2013

Making Java Groovy

Manning Publications recently published "Making Java Groovy" by Ken Kouson so I went and bought it. I often buy books about topics I'm interested in and read them on the train to work to find out if the described technologies can help me with my current or future projects. I really like Manning books because they are usually a good mixture between tutorial/reference and just good introduction into technologies.

Working in an almost Java-only environment have been interested in Groovy for quite some time. Most Java developers know the problem: You have all kinds of libraries or at least code snippets that already capture your business logic. But if you want to run a quick script to make use of it, you either have to create a full blown Java app with all it's draw-backs, or go for some other scripting language that can do the trick with as little code as possible; for my quick-and-dirty scripts I usually resorted to one of bash, PHP or Perl. For more complicated tasks I bit the bullet I wrote Java app, bundled into one big jar with a myriad of command line parameters to be able to switch environment, figure out best settings and what not.

Loops in Message Routing/Graphs

This topic seems to be so beat down even to me that it was somewhat surprising to actually come across a real life example where a major corporation seemed to have not paid attention to it. Even more surprising if this happens to a company which's whole business is about routing: USPS.

I recently ordered something online that was supposed to be delivered by USPS, which was tasked to just complete "the last mile". Now I made a mistake and had the package shipped to my office but accidentally had typed in the zip code of my home address (which are only about 7 miles apart from each other). Looking at the online tracking app, I could not believe what is happening:

Don' forget about IOPS on RDS

A lot of things a slightly different when you run your application on cloud services like AWS. Take database servers: Increased load can always change the performance of DB look-ups and writes. Larger tables can lead to slow queries if the tables are not indexed right, a lot of writes may cause unexpected locking, etc.

However if you use Amazon's AWS there is another important factor you should not loose sight of: IOPS. This measures the number of I/O operations between the database server and its storage, which is attached as network storage.

It's somewhat unclear if/how Amazon throttles the throughput if you don't reserve IOPS. You might be at the mercy of other RDS instances that are running on the machine and probably some kind of rate limit AWS implicitly imposes.

Graph Traversal: Breadth First Search

Breadth First Search (BFS) is a fundamental graph-traversal technique and you could use it of find it being used in a wide range of applications. For a full explanation of the technique check this link, or this link.

The parent array in BFS is very useful for finding interesting paths through a graph; the parent relation defines a TREE OF DISCOVERY with the starting node as the root of the tree.

Because vertices are discovered in order of increasing distance from the root, the tree of discovery contains minimum-length paths from the root to each node.

The time complexity of a BFS implementation is: O(M + N) , that is, it runs in linear time as a function of the sum of the cardinalities of the vertex and edge sets.
Two example application problems where BFS plays a role are:

Connected Components
Two-Coloring (Bipartite checking)

Graph Traversals: Generic Implementation

Recently I had to go back and dive deep into the fundamental elements involved in graph traversal algorithms; specifically, I wanted to get refreshingly clear in my head the essence of these and related techniques. I found that the approach taken in the book I always recommend was right on target to allow me to reach my objective. The approach Prof. Skiena proposes is captured somewhat in the implementation we describe in this and the next two posts in this series.

Base Classes

All graph traversals share a certain "philosophy", in the sense that they are trying to achieve a goal (traverse the graph), and to do so they maintain some state (e.g. discovered, time, ...) and use some appropriate data structure explicitly (queue) or implicitly (stack). The idea is then to abstract those commonalities as much as possible and provide "entry" points to the implementation by means of an API (function/method calls), which enable the tailoring of a generic implementation to the solution specific to a given problem.

Graph Traversals

One of the most fundamental graph problem is: traverse all its nodes and edges systematically. For example, mazes are naturally represented as graphs, and therefore finding a path in a maze can easily be solved by executing some graph traversal algorithm.

The key idea behind graph traversals is: MARK every vertex when we FIRST visit it, and keep track of those vertices NOT YET EXPLORED. In such an approach, each vertex in the graph will be in one of three possible states:

UNDISCOVERED
DISCOVERED
PROCESSED

and in a given traversal, the state progression of a vertex would be UNDISCOVERED, then DISCOVERED, and finally PROCESSED.

Thus, at the core of a graph traversal implementation is a data structure to keep track of the work to be done, that is, the vertices that have been discovered but have not been processed yet. During a traversal, when we reach an UNDISCOVERED vertex, we mark it as DISCOVERED and add it to the data structure as "work to be done". If we reach a vertex that is already DISCOVERED/PROCESSED, we ignore it.

If the graph being traversed is DIRECTED, then edge will be traversed once; if the graph is UNDIRECTED, then each edge will be traversed twice.

Two of the most often used graph traversal algorithms are:

Breadth First Search
Depth First Search

In a following post we will describe each of these approaches. Stay tuned.

Saturday, August 3, 2013

Graphs

Graphs rest at the core of much of computer science. Very often, graph modelling is key to solving algorithmic problems. As a Software Engineer, one needs to have a clear understanding of graphs and, most importantly, how to model real-world problems in terms of graphs. It is of paramount importance to be familiar with a wide range of graph problems; more so than understanding the nuances of how a given specific graph algorithm works. For instance, let's say that you work in a company that is generating tremendous amounts of data and you are considering the options for manipulating it and extracting from it business intelligence related to your application; one option you have is to use a Graph Database. If you go that path, the underlying implementation if the graph database you select will contain a wide range of graph algorithms, which will be used to answer the queries you make against the DB. Being aware how some of the nuances related to graph representations and algorithms will help you decide what is the best way to formulate questions and ask them to the graph DB being used.

Again, for an excellent coverage of graph algorithms check Prof. Skiena's book.

In this and some of our future posts, we will elaborate on graphs in general and some graph algorithmic approaches. The material has been collected and/or summarised from a variety of sources, such as the Internet, books, etc.

Code Learnings

Pages