Siamese Networks

Siamese Networks are feature extractors trained to learn an embedding in $\mathbb{R}^n$ where not the absolute output is important, but the relative one.

The original paper¹ was about signature verification. You have one original signature and one that might be the same or might be a different one. Instead of having one output that directly says "same" or "different", they decided to have one network that maps the input to $\mathbb{R}^{4 \times 19}$ and then use euclidean distance with some threshold².

Today, the nicest usage I know is DeepFace³. It is a neural network which maps faces to $\mathbb{R}^{128}$ by combining a lot of clever ideas.

Key Idea

You have one network $m_1$ which is embedded into another network which combines two inputs of $m_1$. So for training a triplet $(d_1, d_2, t)$ is used, where $d_1$ and $d_2$ are both processed individually by $m_1$. The two outputs $d_{1}'$ and $d_{2}'$ are then combined by another network $m_2$. $m_2$ can simply be a merging layer. The loss is then calculated based on the output of $m_2$.

Applications

Signature Verification: Given two signatures, do they belong to the same person?
Face Verification: Given two faces, are they the same person?
Koch, G., Zemel, R., & Salakhutdinov, R. (2015, July). Siamese neural networks for one-shot image recognition. In ICML deep learning workshop (Vol. 2).
Leal-Taixé, L., Canton-Ferrer, C., & Schindler, K. (2016). Learning by tracking: Siamese CNN for robust target association. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 33-40).

Application: Landmark Distances

Think of autonomous cars. They might know some landmarks and be able to identify them. Then they can measure the distance to those landmarks. And maybe some landmarks are added later. I wondered how well a neural network does in finding coordinates for landmarks, given distances to existing landmarks.

While this problem is better solved with another approach, I was just curious how well a simple neural network would do. The answer was ... underwhelming. I guess I made an error, but at the moment I can't find it:

#!/usr/bin/env python

"""
Train a neural network on distances and see how well it figures out coordinates.

Roughly siamese networks.

The "reference points" are landmarks
"""

# core modules
from itertools import combinations
from functools import partial

# 3rd party modules
import keras.backend as K
from keras.layers import Dense
from keras.layers import Input, Concatenate
from keras.models import Model
from sklearn.metrics.pairwise import euclidean_distances
import numpy as np
from scipy.spatial.distance import euclidean
from sklearn.model_selection import train_test_split

np.random.seed(0)


def main():
    import doctest

    doctest.testmod()

    # Configure Problem
    n_reference_points = 11
    n_points = n_reference_points + 30
    n_dim = 10

    # Define Network for mapping distances to landmarks to point coordinates
    nn = create_network(n_reference_points, n_dim)

    # Make the network trainable
    dists1_in = Input(shape=(n_reference_points,))
    dists2_in = Input(shape=(n_reference_points,))
    point1_out = nn(dists1_in)
    point2_out = nn(dists2_in)
    merged_vector = Concatenate(axis=-1)([point1_out, point2_out])
    model = Model(inputs=[dists1_in, dists2_in], outputs=merged_vector)
    model.compile(loss=partial(dual_loss, n_dim=n_dim), optimizer="adam")

    # Generate Data
    points = generate_data(n_points=n_points, n_dim=n_dim)
    print(points[:n_reference_points])
    distances = get_distances(points, n_reference_points)
    train_points, test_points, train_distances, test_distances = train_test_split(
        points, distances
    )
    distances_p1s, distances_p2s, pair_distances = get_train_data(
        train_points, train_distances
    )

    model.fit(
        [distances_p1s, distances_p2s], pair_distances, batch_size=128, epochs=100
    )
    predicted_points = nn.predict(test_distances)
    error = measure_error(test_points, predicted_points)
    print("Error: {:0.3f}".format(error))
    error = measure_error(test_points, generate_random(predicted_points.shape))
    print("Error (random): {:0.3f}".format(error))


def generate_data(n_points, n_dim):
    """
    Generate n_points of dimension n_dim.

    Examples
    --------
    >>> generate_data(1, 2).tolist()
    [[0.5488135039273248, 0.7151893663724195]]
    """
    points = np.random.random((n_points, n_dim))
    return points


def generate_random(shape):
    """Generate a random point coordinate prediction."""
    return np.random.random(shape)


def get_distances(points, n_reference_points):
    """
    Get the distance of points to the n reference points.

    This includes the pair-wise distance between the reference points.
    """
    ref_points = points[:n_reference_points]
    distances = []
    for point in points:
        distances.append([])
        for ref_point in ref_points:
            distances[-1].append(euclidean(point, ref_point))
    return np.array(distances)


def create_network(n_reference_points, n_dim):
    input_ = Input(shape=(n_reference_points,))
    x = input_
    x = Dense(200, activation="relu")(x)
    x = Dense(200, activation="relu")(x)
    x = Dense(n_dim, activation="sigmoid")(x)
    model = Model(inputs=input_, outputs=x)
    return model


def dual_loss(y_true, y_pred, n_dim=2):
    """
    Define the loss function based on two points.

    Parameters
    ----------
    y_true : ndarray
        The real distance
    y_pred : ndarray
        The first n_dim elements are the first points coordinates,
        the seoncd n_dim elements are the second points coordinates
    """
    point1 = y_pred[:, 0:n_dim]
    point2 = y_pred[:, n_dim:]

    # distance between the points
    embedding_dist = K.sum(K.square(point1 - point2), axis=1)

    # compute loss
    loss = K.abs(embedding_dist - y_true)
    return loss


def get_train_data(points, distances):
    """
    Create training data.

    Parameters
    ----------
    points : List

    Returns
    -------
    (distances_p1, distances_p2, pair_distances) : tuple
        distances_p1 and distances_p2 have the same structure (point 1 and
        point 2) The contens of those two lists are the distances to the
        reference points

        pair_distances : The i-th entry contains the distance between
         distance_pairs[i][0] and distance_pairs[i][1]
    """
    distances_p1 = []
    distances_p2 = []
    pair_distances = []
    for pi1, pi2 in combinations(list(range(len(points))), 2):
        p1_dist = distances[pi1]
        p2_dist = distances[pi2]
        distances_p1.append(p1_dist)
        distances_p2.append(p2_dist)
        pair_distances.append(euclidean(points[pi1], points[pi2]))
    return np.array(distances_p1), np.array(distances_p2), np.array(pair_distances)


def measure_error(real_points, predicted_points):
    """
    Measure an error between the real points and the predicted points.

    This does not punish the points being shifted / rotated.
    """
    real_distances = np.tril(euclidean_distances(real_points), 0).reshape(-1)
    pred_point_distances = np.tril(euclidean_distances(predicted_points)).reshape(-1)
    diff = sum(((real_distances - pred_point_distances) ** 2) ** 0.5)
    return diff / len(real_points)


if __name__ == "__main__":
    main()

Footnotes

Bromley, Jane, et al. "Signature verification using a" siamese" time delay neural network." Advances in neural information processing systems. 1994. ↩
I'm actually not sure which metric they used to calculate the distance between to signature feature matrices. Please ping me if you know more. ↩
Taigman, Yaniv, et al. "Deepface: Closing the gap to human-level performance in face verification." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. (summary) ↩

Siamese Networks

Key Idea

Applications

Application: Landmark Distances

Footnotes

Published

Category

Tags

Contact