## Part I: Performance of Matrix multiplication in Python, Java and C++

This is Part I of my matrix multiplication series. Part I was about simple matrix multiplication algorithms and Part II was about the Strassen algorithm. Part III is about parallel matrix multiplication.

This post is about simple implementations of matrix multiplications. The goal of this post is to find out how easy it is to implement a matrix multiplication in Python, Java and C++. Additionally, I want to get to know how good these solutions are.

The second post will be an implementation of the Strassen algorithm for matrix multiplication. Strassen algorithm does matrix multiplication in $\cal O(n^{log_2(7)+o(1)}) \approx \cal O(n^{2.807})$ instead of $\cal O(n^3)$. I am quite sure this will outperform almost every other change. See Part II: The Strassen algorithm in Python, Java and C++.

The third post will be about parallel programming. I have two cores and I want to see if it will be significantly faster if I use both of them.

## The implementations

I will post all scripts for this test and I’ve added a GIT repository, so feel free to test it on your machine. I am also happy if you post some of your solutions with running times ☺ I am quite sure that my Java and C++ code can be written much better. If you know how, please leave a comment. If you know other languages, you could create a script for these. I focus on Python, Java and C++ as they are very often used.

I have implemented these three types of algorithms for this post:

• ijk-algorithm: This is a simple, straight forward implementation of a matrix multiplication. I've used the definition of matrix multiplication. I didn't use multiple threads.
• ikj-algorithm: just like the ijk-algorithm, but I've switched two of the three the for-loops.
• Library-functions: I always prefer libraries over self-implemented solutions. I think they are faster than anything I could come up with in a reasonable amount of time.

If you post a solution, please consider these restrictions:

• Input: The input file should get passed with the parameter -i, e.g.: python -i 2000.in or java Shell -i 2000.in
• The standard value for the command line parameter -i should be "2000.in" (a $2000 \times 2000$ matrix)
• The user should not have to give the size of the matrix!
• The two square-matrices that should get multiplied are ...
• ... read from a text-file.
• ... represented like this:
• Every line of one matrix is one line in the text-file.
• Newlines are only "\n".
• Every number is separated by "\t".
• The both matrices are separated by one newline.
• Output: The result has to get printed to standard output.
• The result has to be formatted like the input (tabs for separation of number, \n for marks a new line)

## The Tests

I will check the speed of a multiplication of two big matrices following for Python, Java and C++ for all algorithms like this:

time python scriptABC.py -i ../2000.in > result.txt
diff result.txt bigMatrix.out

The bigMatrix.out was produced by the Python ijk-implementation. I make the diff to test if the result is correct.

## The Setting

I created two “random” matrices $A, B \in \mathbb{N}^{2000 \times 2000}$ with this script. The file that was created needs about 29.7 MB and is also in the GIT-Hub repository. But you can also create the matrices with this script:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import random
random.seed(1234)

def createRandomMatrix(n):
maxVal = 1000 # I don't want to get Java / C++ into trouble
matrix = []
for i in xrange(n):
matrix.append([random.randint(0,maxVal) for el in xrange(n)])
return matrix

def saveMatrix(matrixA, matrixB, filename):
f = open(filename, 'w')
for i, matrix in enumerate([matrixA, matrixB]):
if i != 0:
f.write("\n")
for line in matrix:
f.write("\t".join(map(str, line)) + "\n")
n = 3
matrixA = createRandomMatrix(n)
matrixB = createRandomMatrix(n)
saveMatrix(matrixA, matrixB, "2000.in")

All scripts are tested on my computer:

 Acer TravelMate 5735Z CPU 2x Pentium(R) Dual-Core CPU T4500 @2.30GHz RAM 4 GB Video Card Intel GMA 4500MHD System Ubuntu 10.10.04 LTS

## Python

I’ve used Python 2.6.5.

### ijk-algorithm

#!/usr/bin/python
# -*- coding: utf-8 -*-

from optparse import OptionParser
parser = OptionParser()
help="input file with two matrices", metavar="FILE")
(options, args) = parser.parse_args()

A = []
B = []
matrix = A
for line in lines:
if line != "":
matrix.append(map(int, line.split("\t")))
else:
matrix = B
return A, B

def printMatrix(matrix):
for line in matrix:
print "\t".join(map(str,line))

def standardMatrixProduct(A, B):
n = len(A)
C = [[0 for i in xrange(n)] for j in xrange(n)]
for i in xrange(n):
for j in xrange(n):
for k in xrange(n):
C[i][j] += A[i][k] * B[k][j]
return C

C = standardMatrixProduct(A, B)
printMatrix(C)
real	56m49.266s
user	56m30.524s
sys	0m2.980s

### ikj-algorithm

def ikjMatrixProduct(A, B):
n = len(A)
C = [[0 for i in xrange(n)] for j in xrange(n)]
for i in xrange(n):
for k in xrange(n):
for j in xrange(n):
C[i][j] += A[i][k] * B[k][j]
return C
real	44m36.507s
user	44m13.458s
sys	0m2.000s

### Psyco ikj-algorithm

Psyco is a just in time compiler, which makes my scripts MUCH faster. It is very simple to use. Add these two lines at the top of the ikj-script:

import psyco
psyco.full()
real	6m14.820s
user	6m12.959s
sys	0m0.620s

Amazing, isn’t it?

### Libraries

#### NumPy

NumPy-Version: 1.3.0 (Current version is 1.6.2, see Wiki)

#!/usr/bin/python
# -*- coding: utf-8 -*-

import numpy

from optparse import OptionParser
parser = OptionParser()
help="input file with two matrices", metavar="FILE")
(options, args) = parser.parse_args()

A = []
B = []
matrix = A
for line in lines:
if line != "":
matrix.append(map(int, line.split("\t")))
else:
matrix = B
return A, B

def printMatrix(matrix):
matrix = numpy.array(matrix)
for line in matrix:
print "\t".join(map(str,line))

A = numpy.matrix(A)
B = numpy.matrix(B)
C = A * B # easy and intuitive, isn't it?
printMatrix(C)
real	1m38.425s
user	1m36.066s
sys	0m0.520s

#### SciPy

You might need to install python-scitools.

#!/usr/bin/python
# -*- coding: utf-8 -*-

import numpy
import scipy

from optparse import OptionParser
parser = OptionParser()
help="input file with two matrices", metavar="FILE")
(options, args) = parser.parse_args()

A = []
B = []
matrix = A
for line in lines:
if line != "":
matrix.append(map(int, line.split("\t")))
else:
matrix = B
return A, B

def printMatrix(matrix):
matrix = numpy.array(matrix)
for line in matrix:
print "\t".join(map(str,line))

A = scipy.matrix(A)
B = scipy.matrix(B)
C = A * B # easy and intuitive, isn't it?
printMatrix(C)
real	1m35.795s
user	1m33.438s
sys	0m0.488s

### Conclusion for Python

Python execution times for matrix multiplication

Using NumPy is by far the easiest and fastest option. I’ve needed about five minutes for each of the non-library scripts and about 10 minutes for the NumPy/SciPy scripts.

By the way, it is useless to combine Psyco and NumPy. It gets a little bit faster (1 minute and 28 seconds), but this could also be a random effect. If you execute it many times, you will see that the execution time is never the same.

## Java

I am using this Java version:

### ijk-algorithm

#include <sstream>
#include <string>
#include <fstream>
#include <iostream>
#include <vector>

using namespace std;

struct Result {
vector< vector<int> > A;
vector< vector<int> > B;
};

vector< vector<int> > A, B;
Result ab;
string line;
ifstream infile;
infile.open (filename.c_str());

int i = 0;
while (getline(infile, line) &amp;&amp; !line.empty()) {
istringstream iss(line);
A.resize(A.size() + 1);
int a, j = 0;
while (iss >> a) {
A[i].push_back(a);
j++;
}
i++;
}

i = 0;
while (getline(infile, line)) {
istringstream iss(line);
B.resize(B.size() + 1);
int a;
int j = 0;
while (iss >> a) {
B[i].push_back(a);
j++;
}
i++;
}

infile.close();
ab.A = A;
ab.B = B;
return ab;
}

vector< vector<int> > ijkalgorithm(vector< vector<int> > A,
vector< vector<int> > B) {
int n = A.size();

// initialise C with 0s
vector<int> tmp(n, 0);
vector< vector<int> > C(n, tmp);

for (int i = 0; i < n; i++) {
for (int j = 0; j < n; j++) {
for (int k = 0; k < n; k++) {
C[i][j] += A[i][k] * B[k][j];
}
}
}
return C;
}

void printMatrix(vector< vector<int> > matrix) {
vector< vector<int> >::iterator it;
vector<int>::iterator inner;
for (it=matrix.begin(); it != matrix.end(); it++) {
for (inner = it->begin(); inner != it->end(); inner++) {
cout << *inner;
if(inner+1 != it->end()) {
cout << "\t";
}
}
cout << endl;
}
}

int main (int argc, char* argv[]) {
string filename;
if (argc < 3) {
filename = "2000.in";
} else {
filename = argv[2];
}
vector< vector<int> > C = ijkalgorithm(result.A, result.B);
printMatrix(C);
return 0;
}
real	1m40.439s
user	1m38.642s
sys	0m0.280s

### ikj-algorithm

Again, I’ve only switched line 61 and 62.

real	0m15.172s
user	0m14.877s
sys	0m0.248s

### Library: Boost

If you want to compile these scripts, you might have to install the boost libraries first. On Ubuntu you can enter:

sudo apt-get install libboost-math*
#include <sstream>
#include <string>
#include <fstream>
#include <iostream>
#include <vector>
#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/io.hpp>

using namespace std;

struct Result {
boost::numeric::ublas::matrix<int> A;
boost::numeric::ublas::matrix<int> B;
};

int getMatrixSize(string filename) {
string line;
ifstream infile;
infile.open (filename.c_str());
getline(infile, line);
return count(line.begin(), line.end(), '\t') + 1;
}

void printMatrix(boost::numeric::ublas::matrix<int> matrix) {
for (unsigned int i=0; i < matrix.size1(); i++) {
for (unsigned int j=0; j < matrix.size2(); j++) {
cout << matrix(i, j);
if(j+1 != matrix.size2()) {
cout << "\t";
}
}
cout << endl;
}
}

Result ab;
string line;
ifstream infile;
infile.open (filename.c_str());

// get dimension
getline(infile, line);
int n = getMatrixSize(filename);

boost::numeric::ublas::matrix<int> A(n,n), B(n,n);

// process first line
istringstream iss(line);
int a, i = 0, j = 0;
while (iss >> a) {
A(i,j) = a;
j++;
}
i++;

while (getline(infile, line) &amp;&amp; !line.empty()) {
istringstream iss(line);
j = 0;
while (iss >> a) {
A(i,j) = a;
j++;
}
i++;
}

i = 0;
while (getline(infile, line)) {
istringstream iss(line);
j = 0;
while (iss >> a) {
B(i,j) = a;
j++;
}
i++;
}

infile.close();
ab.A = A;
ab.B = B;
return ab;
}

int main (int argc, char* argv[]) {
string filename;
if (argc < 3) {
filename = "2000.in";
} else {
filename = argv[2];
}

boost::numeric::ublas::matrix<int> C;
C = boost::numeric::ublas::prod(result.A, result.B);
printMatrix(C);

return 0;
}
real	4m15.388s
user	4m10.272s
sys	0m0.588s

### Library: Blitz

This is a great example of useless library. I’ve installed the library:

sudo apt-get install libblitz*

Then I wanted to use it. Well, I have no clue how I could exactly use it! See my StackOverflow Question: Is a documentation of Blitz++ matrices available?

### Conclusion for C++

C++ execution times for matrix multiplication

Again, it brings a performance boost if you know how your CPU works. I was very astonished, that the library Boost is slower (actually MUCH slower) than my simplest approach was.

## Conclusion

If I want to create a working piece of code in a minimum amount of time, I will always take Python. It has been very easy to solve this task with the given restrictions. But C++ is amazing when speed is important.

It was astonishingly difficult to find working code examples for this task for Java and C++. I was searching for libraries and found some, but the search results were not satisfying.