XOR is a simple bit-wise defined function:
To make it more convenient, we define \(a \oplus b := \text{XOR}(a, b)\).
If \(a\) and \(b\) have multiple bits, just align the bits to the same number and and apply it point-wise on every index.
This simple function is applicable in many contexts. In Python, you make a
bit-wise XOR(a, b) with a ^ b
.
Multi-Bit Example
Just look at each column:
a = 0011
b = 0110
XOR(a, b) = 0101
Simple Tasks
There are some simple coding competition tasks where you can apply this trick.
Leetcode 136
Given a non-empty array of integers, every element appears twice except for one. Find that single one.
See Leetcode.com
Here is a straight-forward solution to that problem using no additonal space and running in \(\mathcal{O}(n)\):
from typing import List
def find_outlier(nums: List[int]) -> int:
for num in nums[1:]:
nums[0] = num ^ nums[0]
return nums[0]
Pretty cool, isn't it?
Missing Number
Given is an array of (n-1) numbers. Each number is in 1, ..., n and unique. This means you have all numbers from 1 to n, but one is missing and the array is not necessarily sorted. Find the missing number.
This is also a leetcode problem. Leetcode-287 is not exactly the same, but super close. Solvable by the same trick. However, the problem descriptions is wrong 😕
A bit harder, but Leetcode-645 also uses the XOR-trick.
The straight-forward solution with the XOR-trick is:
from typing import List
def find_missing(nums: List[int]) -> int:
# Prepare
missing = 0
for i in range(1, len(nums) + 1):
missing ^= i
# Run
for num in nums:
missing ^= num
return missing
This already has the optimal space complexity of \(\mathcal{O}(1)\) and the optimal time complexity of \(\mathcal{O}(n)\). However, I don't like that the prepare-block is in \(\mathcal{O}(n)\).
Let's call the XOR of the numbers 1 to n, ALL_XOR(n):
The XOR of 1 to n for some numbers can be seen here:
n | $\text{ALL_XOR}(n)$ | n % 4 |
---|---|---|
0 | 0 | 0 |
1 | 1 | 1 |
2 | 3 | 2 |
3 | 0 | 3 |
4 | 4 | 0 |
5 | 1 | 1 |
6 | 7 | 2 |
7 | 0 | 3 |
8 | 8 | 0 |
9 | 1 | 1 |
10 | 11 | 2 |
Hypothesis 1: If n % 4 == 0, then \(\text{ALL_XOR}(n) = n\)
A proof by induction follows:
Base Case (BC): For \(n = 0\) we have \(n\) % 4 == 0 and \(\text{ALL_XOR}(n) = n = 0\).
Induction step: For \(n + 4\), we have:
A few steps to explain:
- R: If you look at \(n\) in binary, the last two digits are zero as n % 4 == 0. This means that \(n+1 = n \oplus 1\).
- BC: The base case
- A + K: Associativity and Kommutativity; I didn't prove it but it is pretty clear from the defintion
Hypothesis 2: If n % 4 == 1, then \(\text{ALL_XOR}(n) = 1\)
And another proof by induction!
Base Case (BC): For \(n = 1\) we have \(n\) % 4 == 1 and \(\text{ALL_XOR}(n) = 0 \oplus 1 = 1\).
Induction step: For \(n + 4\), we have:
Hypothesis 3: If n % 4 == 2, then \(\text{ALL_XOR}(n) = n+1\)
All good things come in threes!
Base Case (BC): For \(n = 2\) we have \(n\) % 4 == 2 and \(\text{ALL_XOR}(n) = 0 \oplus 1 \oplus 2 = 3\).
Induction step: For \(n + 4\), we have:
Hypothesis 4: If n % 4 == 3, then \(\text{ALL_XOR}(n) = 0\)
Base Case (BC): For \(n = 3\) we have \(n\) % 4 == 3 and \(\text{ALL_XOR}(n) = 0 \oplus 1 \oplus 2 \oplus 3 = 0\).
Induction step: For \(n + 4\), we have:
This means there is a faster solution!
from typing import List
def find_missing(nums: List[int]) -> int:
# Prepare
n = len(nums) - 1
if n % 4 == 0:
missing = n
elif n % 4 == 1:
missing = 1
elif n % 4 == 2:
missing = n + 1
else: # n % 4 == 3
missing = 0
# Run
for num in nums:
missing ^= num
return missing
Fault Tolerance
In order to be fault-tolerant, one can calculate the XOR. Please note that this is the same idea as in both toy questions from above. The parity drive in RAID 3 uses this.
XOR Problem
In the context of Machine Learning there is something called the "XOR Problem". The XOR-Problem is a classification problem, where you only have four data points with two features. The training set and the test set are exactly the same in this problem. So the interesting question is only if the model is able to find a decision boundary which classifies all four points correctly: