May 6, 2018

Introduction to data structures

As a developer, you will be likely working with data frequently. Knowing or learning data structures leads to better data processing decisions, optimization and positive impact on business performance so it’s definitely worth investing time in understanding the fundamentals.

A data structure is a way of effectively arranging data in such a way that it can easily be accessed and modified. Data structures can implement abstract data types (ADT). We say abstract because it’s a just a concept of how to structure data, an interface that the structure should adhere to without any implementation details. Many programming languages have different ways of implementing ADTs.

But what is data type? The same way human beings deal with different types of data such as temperature, time, money, computers deal with integers, characters, boolean etc. A data type is basically a set of values and a set of operations on those values. For example, if a computer needs to work with integers, it would probably need to perform some sort of a mathematical operation on it.

There are different data types:
1. primitive – also called primary, fundamental, built-in, etc. (int, char, bool, etc.)
2. composite – also called secondary, derived, these are implemented using primitive data types (string, array, etc.)
3. abstract – examples are list, stack, queue, etc.

Let’s say that you have 3 books, for simplicity I’ll call them red, green and blue. Having finished studying for today, you are now planning for tomorrow. You want to start with the red book continue with the green and finish off with the blue one. What do you do? As you are a very organized student you put blue at the bottom, green in the middle and red at the top. Excellent! You’ve just learned your first ADT – stack. A stack is also called LIFO because of how it operates, Last In First Out. In the example above, the last book we put on the stack is the first book that we take from the stack.

There are many different ADTs and we’ll cover some of them in the future. In this post, we are going to talk about a composite data type – array and an ADT – associative array.

Array

An array is a collection of a fixed number of elements of the same type (int, char, float, etc.) with each element having its own index. The elements can be accessed via indices which are usually integers. By specifying what number of elements in the array we want to store, we reserve a consecutive group of memory locations. It is one of the oldest, simplest and most fundamental data structures. Arrays can be used on their own or often to implement other data structures. For example, a string is usually identified as a data type and is often implemented as an array of characters.

Talking about strings, in JavaScript they behave and act like arrays of characters but they are not quite arrays. Strings are immutable, meaning that once a string is set it cannot be modified. We can use some of the built-in array methods on strings but not all (e.g. reverse()). If we really want to modify a string and have all of the array methods available on it, we need to convert that string to an array first. Magic!

const sampleString = 'sample';
const sampleStringToArray = sampleString.split('');
const sampleStringReversed = sampleStringToArray.reverse().join('');
console.log(sampleStringReversed);

There are several types of arrays, the simplest being a linear (one-dimensional) array. As a matter of the fact, all of the data structures in JavaScript are built on one central ADT – associative array, an advanced type of array. More on this latter.

Native arrays are fast. What is interesting however is that even though we call them arrays in JavaScript, internally they are special types of objects identified as arrays. There is no relationship to where the element is within the array and where the element is stored in the memory. Its integer indices are converted to strings to comply with the standard object format. The downside to this is that arrays in JavaScript are not as efficient as in some other programming languages that implement a native array data structure.

The special array object comes with many methods (e.g. indexOf, pop, push, etc.) available to us. These methods are inherited from Array.prototype object that is used in the construction of arrays. Unlike other languages, JavaScript arrays are collections of anything and array elements can be of mixed data types.

const coloursAndNumbers = ['blue', 'red', 2, function() { return 'inside function' }];
console.log(coloursAndNumbers.indexOf('blue')); // outputs 1
console.log(coloursAndNumbers[3]); // outputs 'inside function'
console.log(coloursAndNumbers.length); // outputs 4
console.log(coloursAndNumbers); // outputs ['blue', 'red', function]

The above array is normally referred to as a dense array meaning that there are no holes in it (all indices have values associated with them). Dense arrays should always be used for best performance. There is an opposite term in the JavaScript world – sparse array. This happens when one or more elements in the array are empty (undefined). Something to be aware of is that if there is an undefined element in the array the .length the property will still count it as a valid one.

const sampleArray = ['one', 'two', , 'four', 'five']; // let's create a sparse array
console.log(sampleArray); // outputs ["one", "two", empty, "four", "five"], hole in the array at index 2
console.log(sampleArray.length); // outputs 5, including the empty element

Multidimensional array

Arrays in JavaScript are uninitialized and one-dimensional. If we want to have a multidimensional array, we will have to build one. In languages like Java, we can create multi-dimensional arrays with the default value of each element being zero.

import java.util.Arrays;


public class HelloWorld {
  public static void main(String[] args) {
    int[][] someArray;
    someArray = new int[2][4];
    someArray[1][2] = 8;
    

    System.out.println(Arrays.deepToString(someArray));
  }
}

As you see we can assign values to a particular element in a particular array with no problem at all. However, in JavaScript this is done a bit differently. We need to manually (i.e. there isn’t a function for it) initialize and populate an array. When creating an array with [] notation, essentially we are creating just a completely empty array. But as we know by definition an array should be of a fixed length. However in JavaScript arrays are dynamic, meaning that they will resize when we need them to.

Array.initialise = function(arraySize, defaultElementValue) {
  for (let i = 0; i < arraySize; i++) {
    let newArray = [];
    

    newArray[i] = defaultElementValue;
  }
  

  return newArray;
}


const test = Array.initialise(10, 0);
console.log(test); // outputs [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

This works, however, there is a problem. If we pass in an empty array as an initial value to each element, we would get a reference to the same array. We must explicitly set each element.

Array.makeMatrix = function(row, column, defaultElementValue) {
  let arrayColumn = [];
  

  for (let i = 0; i < row; i++) {
    let arrayRow = [];
    

    for (let x = 0; x < column; x++) {
      arrayRow[x] = defaultElementValue;
    }
    

    arrayColumn[i] = arrayRow;
  }
  

  return arrayColumn;
}


const myMatrix = Array.makeMatrix(4, 4, 0);
console.log(myMatrix);

Try running this code and you should get a 4 x 4 grid (matrix). Try assigning a value to one of the inner arrays myMatrix[1][1] = 'a'. This should only update the second element in the second array.

Associative array

At the beginning of this post, I mentioned something called an associative array. It might be called differently depending on the language that you use, for example in Python it’s a map, in Ruby it’s a hash, in Java it’s a dictionary. Essentially, it is an ADT that is a set of keys associated values (key-value pairs). The whole JavaScript language is built on it. An associative array is just an object. Values can be accessed by referencing their keys. Note that when creating an associative array we use {} instead of []. To count the number of elements in the associative array we first get an array of keys Object.keys() and then count that array.

const sampleObject = { 'colourOne': 'red', 'colourTwo': 'blue' }; // this creates an object
console.log(sampleObject.colourOne); // we can access the element the way we access object properties
console.log(sampleObject['colourOne']); // we can also access the element the way we access array element
console.log(Object.keys(sampleObject).length);

That’s it for this brief introduction to data structures in JavaScript. Enjoy your day!