Software engineering notes

Data Structures

Big O

A mathematical notation used in computer science to describe the performance or complexity of an algorithm

Complexity

O(1) < O(logn) < O(n) < O(nlogn) < O(n^2) < O(2^n) < O(n!)

Common Data Structures

Binary search tree

Demonstration

               101
        33             105
     9     37      104     144

Basics

The time complexity of insertion in a BST

The time complexity of inserting a new node into a binary search tree (BST) depends on the height of the tree, and can range from O(log n) to O(n), where n is the number of nodes in the tree.

The time complexity of inserting a new node into a BST depends on the height of the tree. In the best case, when the tree is balanced, the height is O(log n), and the time complexity of insertion is also O(log n). This is because each comparison eliminates half of the remaining tree, and we need to perform at most log n comparisons to reach a leaf node.

The implementation difference between Array and Linked List

Heap

Demonstration

max heap

     90
    /   \
  60     50
 /  \   /  \
40  30 20  10

min heap

     10
    /   \
  20     30
 /  \   /  \
40  60 50  70

Basics

A Binary Heap is either Min Heap or Max Heap

Use cases

Operations

The difference of implementation between array and linked list

How do I make decision which implementation to use, arrays or linked lists?

Why does heap sort require O(n log n) time?

The time complexity for heap sort is the sum of the two stages: O(n) time for building the heap and O(n log n) to remove each node in order. Therefore, the overall time complexity is O(n log n).

Build a heap by inserting all elements O(n logn)

each insertion takes O(log n) time, and we perform this n times, resulting in O(n * log n) time complexity

Build a heap from unsorted binary tree O(n)

Heapify works by processing the elements from the middle of the array to the beginning, rearranging elements to satisfy the heap property. Since the majority of elements are located in the lower part of the heap (which requires less work to rearrange), the overall time complexity is reduced to O(n)

Why does building a heap take O(n)? (siftDown)

n = 31, h = log n = 5

                         E                                  h=4
                        / \
                      /     \
                    /         \
                  /             \
                /                 \
              /                     \
            /                         \
           D                           D                    h=3
         /   \                       /   \
       /       \                   /       \
     C           C               C           C              h=2
   /   \       /   \           /   \       /   \
  B     B     B     B         B     B     B     B           h=1
 / \   / \   / \   / \       / \   / \   / \   / \
A   A A   A A   A A   A     A   A A   A A   A A   A         h=0

Explanation

there are n/2^(h+1) nodes at level h, each node will perform at most h times of swap

For example, in the case of node C, it will swap at most 2 times to reach the bottom

the total work of swap in worse case will be

        A              B             C            D             E
    (n/2^1 * 0) + (n/2^2 * 1) + (n/2^3 * 2) + (n/2^4 * 3) + (n/2^(h+1) * h)
=    (n/2 * 0)  +  (n/4 * 1)  +  (n/8 * 2)  +  (2/16 * 3) +   (31/32 * 4)
=       16*0    +    8*1      +     4*2     +    2*3      +     1*4
=    26

the time complexity is O(n)

the proof is too complicated to me, please see https://stackoverflow.com/questions/9755721/how-can-building-a-heap-be-on-time-complexity if you are interested in

Demonstration

Start with the last node in the tree that has children (for a binary tree represented as an array, this is n/2 - 1 where n is the total number of nodes or the array’s length)

what if n is odd e.g. 15, which node should I start? indices must be integers, so you would round down and start with the node at index 6. In a 0-indexed array, this would be the 7th element of the array.

Original

                  2                     <- h=3
                /   \
              /       \
            /           \
          3               1             <- h=2
        /   \            /  \
      7       4       11      9         <- h=1
    /  \    /  \     /  \    /  \
   8   14  10   5   12   6  13  17      <- h=0

Start with h=1

                  2
                /   \
              /       \
            /           \
          3               1
        /   \            /  \
     14      10       12      17
    /  \    /  \     /  \    /  \
   8    7  4    5   11   6  13   9

Next, h=2

                  2
                /   \
              /       \
            /           \
          14              17
        /   \            /  \
      8      10       12      13
    /  \    /  \     /  \    /  \
   3    7  4    5   11   6  1    9

some nodes are exchanged to bottom with more than once time

Then, h=3

                  17
                /   \
              /       \
            /           \
          14              13
        /   \            /  \
      8      10       12      9
    /  \    /  \     /  \    /  \
   3    7  4    5   11   6  1    2

The difference between siftUp and sift Down

siftUp

siftDown

Which is more efficient, siftUp or siftDown?

siftDown

(0 * n/2) + (1 * n/4) + (2 * n/8) + ... + (h * 1)

O(n)

siftUp

(h * n/2) + ((h-1) * n/4) + ((h-2)*n/8) + ... + (0 * 1)

(h * n/2) = log n * n/2 = n log n (the first one alone takes n log n)

If we need to apply an operation to every node, it is preferable to use siftDown rather than siftUp.

Graph

basics

Components

Types

Implementation using Edge list

graph using array

        2 ---------- 0
      /   \
     1-----3

represent in code

graph = [[0,2], [2,3], [2,1], [1,3]]

Each array element represents a connection between two nodes in the graph

pros

cons

use cases

Implementation using Adjacent list

graph

        2 ---------- 0
      /   \
     1-----3

represent in code using array of linked list

graph = [[2], [2,3], [0,1,3], [1,2]]

pros

cons

use cases

Implementation using Adjacent matrix

graph

        2 ---------- 0
      /   \
     1-----3

represent in code using array

graph = {
    [0,0,1,0],
    [0,0,1,1],
    [1,1,0,1],
    [0,1,1,0]
}

pros

cons

use cases

Trie

basics

Represent a trie using linked list

linked list attributes

trie: ["cat", "tr", "trid", "trie", "trief"]

      [   "c"      "t"    ]
        (false)   (false)
         /            \
      ["a"]          ["r"]
     (false)         (true)
      /                  \
   ["t"]                ["i"]
  (true)               (false)
                            \
                      [  "d"    "e"  ]
                       (true)  (true)
                                   \
                                  ["f"]
                                  (true)

ref