Functional Design Patterns


The term Functional Design showed up only with the rise of Object Oriented Design. Till then, it was just the most obvious way of designing an application. It had evolved ground up - from a few hundred lines to millions. With the development, people had formulated some patterns in design. These were just commonly used patterns that people shared as they found useful. They were evaluated on the speed of execution much more than any other parameter.

It is important to understand that functional architecture is not in any way less than object oriented design. It is just another way of looking at the problem. Some problems are simpler with functional design and some with object oriented design. It is very important to identify the difference and apply it accordingly.

It is also true that they are not limited to languages. Yes, some languages provide better features for object oriented code and others for functional code. But that does not impose a particular design pattern.

Principally, functional programming separates the functionality from data. Hence, the functional design patterns are classified into Data Structures and Algorithms.

Algorithms


An algorithm defines a detailed set of steps to obtain a result. An algorithm is a formal definition of these steps that can be implemented in a programming language. It has defined input and outputs, and is expected to complete in finite time.

Any algorithm implies some processing cost - in terms of space and time. This complexity of the algorithm is measured in terms of the amount of data it processes. These are called S(n) and T(n) respectively. The overall complexity is called C(n).

This helps us identify the cost of the algorithm for data measured by n. The complexity of some algorithms grows linearly with n. Some others grow with n2 - denoted as C(n2), and so on. There could be some algorithms that grow with en - denoted by C(en).

When we compare two algorithms, we need to look into these aspects. We need to check out our constraints of space, time and overall complexity. Then evaluate the algorithms based on the S(n), T(n) and C(n). The resource requirement are often not simple. We have best case and worst case and average case requirements.

Each algorithm is unique. But, they are often classified based on their approach to the problem. There is no right or wrong, or better or worse approach here. These are more of terms that make it easier to express and communicate the specific steps of an algorithm. Typically, any algorithm follows different approaches at each step.

  • Greedy Algorithm: This implies starting with the largest side of the problem, or with maximum amount of data.
  • Lazy Algorithm: This is another approach that works on postponing stuff until it is really needed. This tends to reduce the resource utilization by avoiding effort on things that are not required at all.
  • Divide and Conquer: As the name suggests, this algorithm works by dividing the problem over and over, till we have very small atomic pieces that can be solved easily.

Data Structures


Most algorithms work on data. Naturally, the efficiency of the algorithm depends upon how we lay the data. If the structure of our data is not good, the algorithm could turn out to be a lot costlier. Hence, the structure of our data is an important component of functional design.

Each problem requires a unique structure of the data. But, often that is composed of some very common data structures. Let us have a look these common data structure elements. Once we understand them, we can build up a good data structure for the given problem

The 5 primary structures are:

  • Array
  • Linked List
  • Stack
  • Queue

An array is a continuous block of memory allocated for a series of variables of the same type. Since the memory block is continuous, accessing a given index requires very little processing. Hence it takes very little time. At the same time, it may be difficult to obtain a block of contiguous memory. That makes it very costly in memory usage. It leads to fragmentation very fast. Thus, if arrays are not used well, we end up spending a lot on the defragmentation.

Linked Lists, on the other hand are very efficient with memory management. But then, the performance is not so good. We have to go through the entire chain to locate any given element. We have variants like Circular Linked Lists and Doubly Linked Lists that may be useful in speeding up some specific kinds of iterations. If we are sure the traversal is limited to these constraints, then a linked list could be an ideal combination of memory as well as processing.

We also have a combination of arrays and linked lists - Array List. Here, we have a list of small arrays. That gives us a golden mean between the two. We have an intermediate memory and processing cost.

Arrays and Lists are meant for random access - when we want to directly access any given element. But there are times when we really do not need that. In some algorithms, we are sure we will need to access the elements in the LIFO (Last In First Out) order. In such a case, we can afford a little more. Such data structure is called a Stack.

Similarly, there are some algorithms that require elements in the FIFO (First In First Out) order. In such a case, we use a Queue.

For us to meaningfully work with the data structures, we need one or more of these functions implemented for it.

  • Traversing - Sequentially accessing all the elements
  • Searching - Looking for an element with the given value or constraint.
  • Insertion - Adding a new element into the structure, at the given position.
  • Deletion - Removing a particular element from the data structure.
  • Sorting - Rearranging the elements in the data structure according to a given order.
  • Merging - Combining the elements of two different data sets.

There are several different ways of implementing each of these. Each implementation offers a trade off between the memory and processing cost.

Search and Sort are the most important operations on any data structure. These are the fundamental operations that load any computational process. In any application of any complexity, the most computational resources are spent on the search and sort. Hence, optimizing the search and sort has a major impact on the overall performance of any software application.

Search Algorithms


There are different types of search algorithms that help us choose between the speed, memory and complexity.

Linear Search


This is the simplest and the most intuitive way of searching through the data. We pick up the elements sequentially and identify if this is the element we want. If yes, we conclude - else we continue. This is the simplest to implement; but not so good in its cost. In the best case scenario, we can get the result in the first attempt - in the worst case, we will have to look through the whole list before we conclude. Thus, it has cost C(N).

Binary Search


This works only if we have a sorted data set that is easy to index. We start with the middle element and compare it with our target. If it matches, we are done. Else, if it is less, we jump to the element between this and the first (1/4) and if it is more, we jump to the point between this and end (3/4). Thus, at each step, we identify if we need to stop or move up or move down. At each time, our step reduces by half.

With this, the best case is of course the first hit and the worst case is less than N. It is O(log N).

Thus, the cost is not linear - it is logarithmic. But, the two major requirements are that the data should be sorted and it should be easy to index. If we have a singly linked list, we will have to traverse from the first element for every index. In such a case, it would be a terrible mistake to use binary search. Similarly, if the data is not sorted, the binary search will never work.

Should we sort the data just for binary search. That is not as stupid as it sounds. It depends upon the data size and the number of times we plan to search. There are times when it is better to sort the data once before we start with the search.

Interpolate Search


This further extends the concept of binary search. Instead of jumping right in the middle of the list, it tries to estimate the position of the target based on the assumption that the data is uniform. That is, in a list of numbers that start with 1 and end in 100, it assumes that number 40 would be around the 40th position. Thus it starts from 40. And then, at each branch, it further estimates the position of the target element based on the two extremes.

If we have a good, uniform distribution of the data, this works wonderfully well and we can reach the target really fast. Its cost is O(log (log N)) - which is much better than binary search.

But the assumption here is that the data should be uniformly distributed. Else, the excessive calculation for interpolation would be an absolute wastage.

Hash Key


As we saw above, searching requires a lot of effort in comparing the data. We count the complexity and cost of an algorithm based on the number of times we have to compare data elements. But, what if we simplify this comparison itself?

Hashing is an innovative concept that can have a fabulous impact on the search speed. Over the years, hashing has earned its place in many other domains (encryption, security, finance, communication...) as well. But this is where it started.

A hash value is a numeric value obtained for a chunk of data. The hashing algorithm is out of our scope. But, essentially it uses the given data chunk as a sequence of bytes and numerically works on these bytes to get an integer. Of course, this integer is not unique. Just because the hash value matches, it does not mean that the data chunks are equal. But, the way it is implemented, it has a fairly distributed wide range - so there are very few overlaps is a real world application.

Here, we get hash (keys) for all the elements in the available data (values) and note the relation between the key and data. Since these are numbers, we can save them in an array. Now, when we want to search for an element, we start by searching the hash key itself. Since this is numeric search on an array, it can be performed pretty fast. Once we have identify the key, we can then look into the elements with that key - if any of them really match our search object.

Sorting Algorithms


Sorting is the other fundamental operation that loads any computational process. Let us look at some of the considerations in sorting algorithms

In Place Sorting


In essence, any sorting algorithm has to compare elements and swap their positions. Swapping their position could mean physically changing their position in the memory, or the same could be achieved by playing with their index or link, etc.

If we physically swap their position in memory, we need a third place in the memory where we store one of the two elements and then overwrite it with the second. A memory swap of elements A and B would mean, creating a copy of A, overwriting A with B and then overwriting B with the copy of A. Thus, we have three memory copies. This may seem small when we sort a small array of numbers. But, if we have to sort a massive array of huge data structures, such copies would be very costly.

On the other hand plays with the indices and links. For example, if we have to swap two elements A and B in a singly linked list, we need to change the 4 next pointers - in elements before A / B and in elements A and B themselves. Even if A and B are huge data structures, this is all we need to do. There is no memory copy involved. This is called in place sorting. But there are some algorithms that leave the current data untouched and create a new sorted copy of that data. This is not in place sorting. Here, we have lesser number of copies (1 per element). But it requires more There are times when we need to continue working with the existing data in the existing order - the latter is useful in such a case.

Stable Sorting


When we sort elements in a list, we change their positions. If one element is less than the other - as per the given criteria - then we are sure that one of them will be before the other. But, how about elements that are equal according to the given criteria? Which one will go first? Is that defined?

Some sort algorithms ensure that the relative positions of such elements is not altered. Some others leave this unpredictable. The first is called stable sorting. The implications of stable or unstable sorting are not so obvious. But there are scenarios where we need stable sorting, and there are some scenarios where it does not matter.

Adaptive Sorting


Some algorithms assume by default that the elements are not sorted at all, and go through the whole process irrespective of the current state. Others assume that they are "almost sorted" and work towards improving this state. The latter is called adaptive sorting.

Again, there is no good or bad here. Sometimes, we know that the data is really in chaos and it makes sense to just discard the current order and start afresh. But there are other times, when the data is not that chaotic and we like to encash on any existing order in there.

We can choose an algorithm based on our requirements.

Insertion Sort


This works great with linked lists with a dynamic list. We start with a list of a single element. Ofcourse that is sorted. Then, we add another element - in the appropriate position to keep the list sorted. Thus, we continue to add elements, inserting them in the appropriate position to keep the list sorted.

Now, if we remove or add elements from this list, it is a very simple job. Removal ofcourse does not impact the sort order. But if we a new element, we just follow the same process of pushing it into the appropriate position.

This is neither stable nor adaptive nor in place. This works great with linked lists. But, if we are working with an array, any insertion would mean shifting the elements around it. That would be really costly.

Selection Sort


This is an in place sorting algorithm. Here, we start by identifying the smallest element in the entire set. To do this, we start by comparing the first two elements. The smaller of the two is remembered and we compare it with the third. The smaller of the two is then compared with the fourth and so on.

Once we have the smallest element, we swap it with the first element in the list. Then, we identify the smallest element in the remaining data. This is swapped with the second, and so on. Thus, we create a sorted list on one side - that continues to grow, shrinking the remaining unsorted part of the list.

Bubble Sort


Have you seen bubbles of air pushing their way through a glass of water? How does it happen? The small bubble of air is lighter than water, thus it moves up because of the gravitational pull. But, it does not have to prove itself right in the beginning. It compares itself with water just above it. Moving upward, it continues to compare with the water just above it, and continues to move upwards till it has air above it - that is when it stops moving.

The bubble sort algorithm is inspired by this process. Here, we do not compare all the elements in the data structure. We start with one end, comparing the element with its neighbor. If required, we swap them, or we just move forward. We do this continuously, till we notice that there is no swap anymore - the data is sorted.

This algorithm is in place, adaptive and also stable. This works well when we have all the data with us, and we do not expect any more changes. If we append an element to the end, we will have to go through the entire process, requiring several swaps till the element slides to its correct position.

Shell Sort


This is an extension of the bubble sort algorithm. This works out to be quite efficient. It is in place as well as adaptive. But not stable.

We start by defining an "interval". We split the list based on this interval, by picking up (in place) the elements at the given interval. Here, we compare and swap (if required) the neighboring elements of this sub list - bubbling through them. We then reduce the interval and repeat the process - continuing the same by reducing the interval up to 0 - where we bubble through the entire list. But by this time, the list is almost sorted and so we have very little processing left.

Merge Sort


Often, the entire data is too huge to sort at a time. Merge sort helps us simplify this problem. It is based on the concept of divide and rule. We start bu splitting the data into two parts. We sort this using one of the sort algorithms. Then, we merge these two sorted lists.

To do this, we compare the tips of the two sorted lists. The one which is smaller is pushed into to the sorted list. And then we compare the tips again. Doing this iteratively, we end up with a sorted list. But this is not in-place. It could be adaptive or stable - based on the actual algorithm used for sorting the individual subsets.

Quick Sort


This improves on the merge sort. This is not in place. But it is adaptive, as well as stable. It is good for huge data sets. It works by choosing a pivot value. The entire data is compared with this pivot and two sub lists are created - with elements that are higher or lower than this pivot.

Now, if we sort these two sublists, we can be sure that we get a sorted list simply by appending them. Then it does the same with the two sub lists - generating four sublists. This is repeated recursively till the entire data is sorted - or the sub lists are small enough to use another sorting algorithm.

Then we can simply append them in the right order to get a sorted list.

Regular Expressions


A lot of the searching and sorting is based on regular expressions. These form a very important concept in functional programming.

Developers often encounter a requirement where we need to check if the contents of a string follow a particular pattern. For example, we might want to know if the string contains any white space, or any non alphanumeric characters, or if it starts with a capital, ... There are several such scenarios. Few decades ago, people had to get down to using a substring (or some pointer arithmetic in the age of C). But as this requirement surfaced again and again, developers came up with libraries that helped with this task, that formalized over time as regular expressions. In due time, Regular expressions became an integral part of any language.

The syntax of Regular Expressions is almost the same in all languages. There could be minor value additions and flavors of each language, but it is essentially the same. Having understood Regex for one language, we can easily use that knowledge for any other language. Here are some of the important concepts in regular expressions that can help us get into a detailed study.

Character Groups


Regular expressions allow us a lot more than character matches. We can add logical options like "one of these characters"

No.Character ClassDescription
1[abc]a, b, or c (simple class)
2[^abc]Any character except a, b, or c (negation)
3[a-zA-Z]a through z or A through Z, inclusive (range)
4[a-d[m-p]]a through d, or m through p: [a-dm-p] (union)
5[a-z&&[def]]d, e, or f (intersection)
6[a-z&&[^bc]]a through z, except for b and c: [ad-z] (subtraction)
7[a-z&&[^m-p]]a through z, and not m through p: [a-lq-z](subtraction)

Example:


package com.solegaonkar.learnjava;

import java.util.regex.*;  
class RegexExample3{  
 public static void main(String args[]){  
  System.out.println(Pattern.matches("[xyz]", "abcd"));    //false (not x or y or z)  
  System.out.println(Pattern.matches("[xyz]", "y"));    //true (among x or y or z)  
  System.out.println(Pattern.matches("[xyz]", "xyzxyzxxx"));    //false (more than one occurence)  
 }
}

Predefined character classes


There are some character groups that we use quite often. For example, to check if it is a number or an alphabet... To add to our convenience, RegEx syntax has some predefined character classes

ConstructDescription
.Any character (may or may not match line terminators)
\dA digit: [0-9]
\DA non-digit: [^0-9]
\sA whitespace character: [ \t\n\x0B\f\r]
\SA non-whitespace character: [^\s]
\wA word character: [a-zA-Z_0-9]
\WA non-word character: [^\w]

Example:


package com.solegaonkar.learnjava;

import java.util.regex.*;  
class RegexExample5{  
 public static void main(String args[]){  
  System.out.println("metacharacters d....");\\d means digit  
    
  System.out.println(Pattern.matches("\\d", "abc"));//false (non-digit)  
  System.out.println(Pattern.matches("\\d", "1"));//true (digit and comes once)  
  System.out.println(Pattern.matches("\\d", "4443"));//false (digit but comes more than once)  
  System.out.println(Pattern.matches("\\d", "323abc"));//false (digit and char)  
    
  System.out.println("metacharacters D....");\\D means non-digit  
    
  System.out.println(Pattern.matches("\\D", "abc"));//false (non-digit but comes more than once)  
  System.out.println(Pattern.matches("\\D", "1"));//false (digit)  
  System.out.println(Pattern.matches("\\D", "4443"));//false (digit)  
  System.out.println(Pattern.matches("\\D", "323abc"));//false (digit and char)  
  System.out.println(Pattern.matches("\\D", "m"));//true (non-digit and comes once)  
    
  System.out.println("metacharacters D with quantifier....");  
  System.out.println(Pattern.matches("\\D*", "mak"));//true (non-digit and may come 0 or more times)  
 }
}

Quantifiers


Another convenience feature of the RegEx syntax is the quantifier. At times we need to check for more than just one character.

GreedyReluctantPossessiveMeaning
X?X??X?+X, once or not at all
X*X*?X*+X, zero or more times
X+X+?X++X, one or more times
X{n}X{n}?X{n}+X, exactly n times
X{n,}X{n,}?X{n,}+X, at least n times
X{n,m}X{n,m}?X{n,m}+X, at least n but not more than m times

There are three main types of quantifiers. Greedy - Try to grab as much as possible. Reluctant grab as little as possible. While, possessive goes another step beyond greedy algorithms - to get multiples of itself.

Example


package com.solegaonkar.learnjava;

import java.util.regex.*;  
class RegexExample4{  
 public static void main(String args[]){  
  System.out.println("? quantifier ....");  
  System.out.println(Pattern.matches("[amn]?", "a"));//true (a or m or n comes one time)  
  System.out.println(Pattern.matches("[amn]?", "aaa"));//false (a comes more than one time)  
  System.out.println(Pattern.matches("[amn]?", "aammmnn"));//false (a m and n comes more than one time)  
  System.out.println(Pattern.matches("[amn]?", "aazzta"));//false (a comes more than one time)  
  System.out.println(Pattern.matches("[amn]?", "am"));//false (a or m or n must come one time)  
    
  System.out.println("+ quantifier ....");  
  System.out.println(Pattern.matches("[amn]+", "a"));//true (a or m or n once or more times)  
  System.out.println(Pattern.matches("[amn]+", "aaa"));//true (a comes more than one time)  
  System.out.println(Pattern.matches("[amn]+", "aammmnn"));//true (a or m or n comes more than once)  
  System.out.println(Pattern.matches("[amn]+", "aazzta"));//false (z and t are not matching pattern)  
    
  System.out.println("* quantifier ....");  
  System.out.println(Pattern.matches("[amn]*", "ammmna"));//true (a or m or n may come zero or more times)  
 }
}

Boundary Matchers


In all the cases we checked above, the expressions match a particular set of characters in the input. Boundary matchers allow you to take a step further to matches borders - like beginning of a word, or the string or end of a word or string. These can help us with requirements like identify words that end with 'ed' or match the article a - not just any character a.

Boundary ConstructDescription
^The beginning of a line
$The end of a line
\bA word boundary
\BA non-word boundary
\AThe beginning of the input
\GThe end of the previous match
\ZThe end of the input but for the final terminator, if any
\zThe end of the input

Example


package com.solegaonkar.learnjava;

import java.util.regex.*;
public class RegexExample5 {
 public static void main(String[] args) {
  String txt = "xyz xyzxyz";

  // Demonstrating ^
  String regex1 = "^xyz";
  Pattern pattern1 = Pattern.compile(regex1, Pattern.CASE_INSENSITIVE);
  Matcher matcher1 = pattern1.matcher(txt);
  while (matcher1.find()) {
   System.out.println("Start index: " + matcher1.start());
   System.out.println("End index: " + matcher1.end());
  }

  // Demonstrating $
  String regex2 = "xyz$";
  Pattern pattern2 = Pattern.compile(regex2, Pattern.CASE_INSENSITIVE);
  Matcher matcher2 = pattern2.matcher(txt);
  while (matcher2.find()) {
   System.out.println("\nStart index: " + matcher2.start());
   System.out.println("End index: " + matcher2.end());
  }
 }
}

Groups


Groups do not particularly contribute in matching a pattern. But have a good role to play when retrieving data from the Matcher. If the RegEx has groups built into it, we can extract individual elements of the matched sequence. For example if the RegEx is (\d)\d(\d) and the input string is 123, the pattern will match and when we invoke group(1), we get 1 - because that is the first group in the match. group(2) would return 3 - because that is the second group in the match.

MethodDescription
start(int group)Returns the start index of the subsequence captured by the given group during the previous match operation.
end (int group)Returns the index of the last character, plus one, of the subsequence captured by the given group during the previous match operation.
group (int group)Returns the input subsequence captured by the given group during the previous match operation.

This is very useful when we want to extract only a part of the string that matches. For example, if we want the domain names from all the Email Id's in a document, we can create a RegEx that matches any Email ID and have a group inside that RegEx that could extract the domain name.