How joins works on Relational Database Like MySql

2 October 2024

@Bibhabendu Mukherjee

Have you ever wonder about how joins works in the database , let's delve dive into this concept and try to uncover what's going on behind the joins.

As we all know that joins are pretty common in the transactional Database like MySQL , PostgreSQL and analytics database like BigQuery , even in the distributed computation system like Apaches Spark and in every where joins works in same manner , we write just our joins without paying a little attention to it.

First of all see what exactly joins is what joins do , Joins as the name suggest that it joins something , "YES" it's right joins basically joins multiple result set or tables based on a matching condition. Bellow let's take one common example relationship of Student and Department.

Here’s an example of how you would write a SQL JOIN query based on the STUDENT and DEPARTMENT tables

Student table (student_id , student_name , department_id)

Image

Department table (department_id, department_name)

Image

To get a list of students along with their respective department names.

SELECT 
    STUDENT.student_id, 
    STUDENT.student_name, 
    DEPARTMENT.department_name
FROM 
    STUDENT
JOIN 
    DEPARTMENT
ON 
    STUDENT.department_id = DEPARTMENT.department_id;

Now the question is how the database actually join this 2 tables.

The database do follow 3 main joins strategies -- Nested Loop Join, Merge Join, and Hash Join.Each of these strategies has different performance characteristics depending on the size and structure of the data being joined.

Nested Loop Join (NLJ)

Explanation:

  • Nested Loop Join works by iterating over each row in the first table and for each row, it scans the entire second table to find matching rows. This method can be slow if both tables are large, but it's useful for small datasets or when indexes can be leveraged.
  • Typically, it is used when no pre-sorting or hashing can be done, and the tables are not very large.
Image

How Nested Loop Join Works:

The database takes each row of the STUDENT table.

For each row in the STUDENT table, it scans the DEPARTMENT table to find the matching department_id.

If a match is found, the rows are joined and returned.

For example, for Alice (department_id = 101):

  • The database scans the DEPARTMENT table and finds the department with department_id = 101, which is Computer Science.

Merge Join

Explanation:

  • Merge Join works by first sorting both tables on the join column and then iterating through both tables simultaneously, comparing each row. If a match is found, those rows are joined.
  • Merge joins are highly efficient when both tables are already sorted or can be sorted efficiently on the join column.

How Merge Join Works:

Both tables are scanned in parallel since they are sorted by department_id.

The database compares each row from the STUDENT table with the corresponding row in the DEPARTMENT table based on department_id.

If the department_id values match, the rows are joined.

For example:

  • The database sees that Alice and Carol (both with department_id = 101) match with the DEPARTMENT table’s row where department_id = 101 (Computer Science).
  • Then, Bob with department_id = 102 matches with the DEPARTMENT row with department_id = 102 (Mathematics).

Hash Join

Explanation:

  • Hash Join uses a hash table to perform the join. The database builds a hash table on the smaller table (in memory), using the join column as the key. Then, it scans the larger table, probing the hash table to find matching rows.
  • This is an efficient strategy for large datasets when there's enough memory to build the hash table.

How Hash Join Works:

The database chooses the smaller table (DEPARTMENT in this case) to build a hash table based on the department_id.

It scans the STUDENT table and for each department_id in STUDENT, it probes the hash table to find matching rows in the DEPARTMENT table.

If a match is found, the rows are joined.

For example:

  • The hash table is built on the DEPARTMENT table, using department_id as the key.
  • As the STUDENT table is scanned, Alice and Carol (with department_id = 101) find a match in the hash table for Computer Science.
  • Bob (with department_id = 102) finds a match for Mathematics.