<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Martin Thoma</title>
	<atom:link href="http://martin-thoma.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://martin-thoma.com</link>
	<description>A blog about Code, the Web and Cyberculture.</description>
	<lastBuildDate>Sun, 19 May 2013 20:19:35 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Solving equations of unipotent lower triangular matrices</title>
		<link>http://martin-thoma.com/solving-equations-of-unipotent-lower-triangular-matrices/</link>
		<comments>http://martin-thoma.com/solving-equations-of-unipotent-lower-triangular-matrices/#comments</comments>
		<pubDate>Sun, 19 May 2013 20:15:59 +0000</pubDate>
		<dc:creator>Martin Thoma</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[mathematics]]></category>
		<category><![CDATA[Matrix]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[systems of equations]]></category>

		<guid isPermaLink="false">http://martin-thoma.com/?p=66841</guid>
		<description><![CDATA[<p>Suppose you have an equation like with and . and are given and you want to solve for . Example With , the problem could look like this: This is only a shorthand for: This is easy to solve, isn&#8217;t it? First step: Solve for First you see that . Now you replace every occurence [...]</p><p>The post <a href="http://martin-thoma.com/solving-equations-of-unipotent-lower-triangular-matrices/">Solving equations of unipotent lower triangular matrices</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></description>
				<content:encoded><![CDATA[<p>Suppose you have an equation like \(L \cdot x = b\) with \(L \in \mathbb{R}^{n \times n}\) and \(x,b \in \mathbb{R}^n\). \(b\) and \(L\) are given and you want to solve for \(x\).</p>
<h2>Example</h2>
<p>With \(n=5\), the problem could look like this:</p>
\(\begin{pmatrix}<br />
1 &#038; 0 &#038; 0 &#038; 0 &#038; 0\\<br />
2 &#038; 1 &#038; 0 &#038; 0 &#038; 0\\<br />
7 &#038; 1 &#038; 1 &#038; 0 &#038; 0\\<br />
8 &#038; 2 &#038; 8 &#038; 1 &#038; 0\\<br />
1 &#038; 8 &#038; 2 &#038; 8 &#038; 1<br />
\end{pmatrix} \cdot<br />
\begin{pmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \\ x_5 \end{pmatrix} =<br />
\begin{pmatrix}   3 \\ 1   \\ 4   \\ 1   \\ 5   \end{pmatrix}\)
<p>This is only a shorthand for:<br />
\(<br />
\begin{align}<br />
&#038;1 \cdot x_1 &#038;= 3 \\<br />
&#038;2 \cdot x_1 + 1 \cdot x_2 &#038;= 1\\<br />
&#038;7 \cdot x_1 + 1 \cdot x_2 + 1 \cdot x_3 &#038;= 4\\<br />
&#038;8 \cdot x_1 + 2 \cdot x_2 + 8 \cdot x_3 + 1 \cdot x_4 &#038;= 1\\<br />
&#038;1 \cdot x_1 + 8 \cdot x_2 + 2 \cdot x_3 + 8 \cdot x_4 + 1 \cdot x_5 &#038;= 5<br />
\end{align}<br />
\)</p>
<p>This is easy to solve, isn&#8217;t it?</p>
<h3>First step: Solve for \(x_1\)</h3>
<p>First you see that \(x_1 = 3\). Now you replace every occurence of \(x_1\) in the system of equations above:</p>
\(<br />
\begin{align}<br />
&#038;1 \cdot 3 &#038;= 3 \\<br />
&#038;2 \cdot 3 + 1 \cdot x_2 &#038;= 1\\<br />
&#038;7 \cdot 3 + 1 \cdot x_2 + 1 \cdot x_3 &#038;= 4\\<br />
&#038;8 \cdot 3 + 2 \cdot x_2 + 8 \cdot x_3 + 1 \cdot x_4 &#038;= 1\\<br />
&#038;1 \cdot 3 + 8 \cdot x_2 + 2 \cdot x_3 + 8 \cdot x_4 + 1 \cdot x_5 &#038;= 5<br />
\end{align}\)
<p>Now you make the multiplications and remove the first trivial line.</p>
\(<br />
\begin{align}<br />
&#038;6 + 1 \cdot x_2 &#038;= 1\\<br />
&#038;21 + 1 \cdot x_2 + 1 \cdot x_3 &#038;= 4\\<br />
&#038;24 + 2 \cdot x_2 + 8 \cdot x_3 + 1 \cdot x_4 &#038;= 1\\<br />
&#038;3 + 8 \cdot x_2 + 2 \cdot x_3 + 8 \cdot x_4 + 1 \cdot x_5 &#038;= 5<br />
\end{align}<br />
\)
<h3>Second step: update</h3>
<p>Get the constant factors to the right side of the equations:<br />
\(<br />
\begin{align}<br />
&#038;1 \cdot x_2 &#038;= 1-6=-5\\<br />
&#038;1 \cdot x_2 + 1 \cdot x_3 &#038;= 4-21=-17\\<br />
&#038;2 \cdot x_2 + 8 \cdot x_3 + 1 \cdot x_4 &#038;= 1-24=-23\\<br />
&#038;8 \cdot x_2 + 2 \cdot x_3 + 8 \cdot x_4 + 1 \cdot x_5 &#038;= 5-3=2<br />
\end{align}<br />
\)</p>
<p>You can now easily see that you&#8217;re in the same situation as in the first step! Next you will solve for \(x_2\), then for \(x_3, x_4\) and finally for \(x_5\).</p>
<h2>Python straightforward algorithm</h2>
<pre class="brush: python; title: ; notranslate">
#!/usr/bin/env python
# -*- coding: utf-8 -*-

def solveUnipotentLowerTriangularMatrix(L, b):
    x = [0] * len(b)
    for step in range(0, len(b)):
        x[step] = b[step]
        for row in range(0, len(b)):
            b[row] = b[row] - L[row][step]*x[step]
    return x

if __name__ == &quot;__main__&quot;:
    L = [[1, 0, 0, 0, 0],
         [2, 1, 0, 0, 0],
         [7, 1, 1, 0, 0],
         [8, 2, 8, 1, 0],
         [1, 8, 2, 8, 1]]
    b =  [3, 1, 4, 1, 5]

    print(solveUnipotentLowerTriangularMatrix(L, b))
</pre>
<p>Pretty easy, isn&#8217;t it? But can we even do better?</p>
<h2>Even better algorithm</h2>
<p>Yes, we can!</p>
<p>Take a look at what&#8217;s happening when row = 0 in line 9. We make a step that is not necessary. Also, we can take the space of b to store x!</p>
<pre class="brush: python; title: ; notranslate">
#!/usr/bin/env python
# -*- coding: utf-8 -*-

def solveUnipotentLowerTriangularMatrix(L, b):
    for step in range(0, len(b)):
        for row in range(step+1, len(b)):
            b[row] -= L[row][step]*b[step]

if __name__ == &quot;__main__&quot;:
    L = [[1, 0, 0, 0, 0],
         [2, 1, 0, 0, 0],
         [7, 1, 1, 0, 0],
         [8, 2, 8, 1, 0],
         [1, 8, 2, 8, 1]]
    b =  [3, 1, 4, 1, 5]

    solveUnipotentLowerTriangularMatrix(L, b)
    print(b)
</pre>
<p>Now it looks super clean, doesn&#8217;t it <img src='http://martin-thoma.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>Keep in mind that you have to store L and b if you need the values after you&#8217;ve applied this algorithm.<br />
This is the reason why there here. This algorithm &#8220;returns&#8221; its value by manipulating b.</p>
<h2>Time complexity</h2>
<p>I&#8217;ll analyze the second algorithm.</p>
<p>Let&#8217;s assume that line 7 takes \(c\) operations and \(n\) is the size of \(L \in \mathbb{R}^{n \times n}\).</p>
<p>Then we would have a total of </p>
\(\begin{align}<br />
\text{Operations} &#038;= \sum_{i=1}^n \left ( \sum_{j=i+1}^n c \right )\\<br />
&#038;= c \cdot \sum_{i=1}^n \left ( \sum_{j=i+1}^n 1 \right )\\<br />
&#038;= c \cdot \sum_{i=1}^n (n &#8211; i)\\<br />
&#038;= c \cdot \left ( \sum_{i=1}^n n &#8211; \sum_{i=1}^n i \right )\\<br />
&#038;= c \cdot \left ( n^2 &#8211; \frac{n^2+n}{2} \right )\\<br />
&#038;= \frac{c}{2} (n^2 &#8211; n)<br />
\end{align}\)
<h2>Space complexity</h2>
<p>Well, thats simple: \(\mathcal{O}(1)\)!</p>
<p>The post <a href="http://martin-thoma.com/solving-equations-of-unipotent-lower-triangular-matrices/">Solving equations of unipotent lower triangular matrices</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://martin-thoma.com/solving-equations-of-unipotent-lower-triangular-matrices/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>k-nearest-neighbor clustering &#8211; an interactive example</title>
		<link>http://martin-thoma.com/k-nearest-neighbor-clustering-interactive-example/</link>
		<comments>http://martin-thoma.com/k-nearest-neighbor-clustering-interactive-example/#comments</comments>
		<pubDate>Sun, 19 May 2013 13:23:11 +0000</pubDate>
		<dc:creator>Martin Thoma</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[HTML5]]></category>
		<category><![CDATA[JavaScript]]></category>
		<category><![CDATA[k-means]]></category>
		<category><![CDATA[KogSys]]></category>

		<guid isPermaLink="false">http://martin-thoma.com/?p=66571</guid>
		<description><![CDATA[<p>Click to create new green dots. Ctrl+Click to create new blue dots. When the circle has exactly the same number of blue / green dots in it, it will be green. See also Voronoi diagram K-nearset neighbor k-means clustering</p><p>The post <a href="http://martin-thoma.com/k-nearest-neighbor-clustering-interactive-example/">k-nearest-neighbor clustering &#8211; an interactive example</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></description>
				<content:encoded><![CDATA[<p><iframe src="http://martin-thoma.com/html5/k-nearest-neighbor.htm" width="98%" height="700px"></iframe></p>
<p>Click to create new green dots.<br />
Ctrl+Click to create new blue dots.</p>
<p>When the circle has exactly the same number of blue / green dots in it, it will be green.</p>
<h2>See also</h2>
<div id="attachment_66811" class="wp-caption aligncenter" style="width: 411px"><a href="http://martin-thoma.com/wp-content/uploads/2013/05/k-nearest-neighbor-interesting-setting.png"><img src="http://martin-thoma.com/wp-content/uploads/2013/05/k-nearest-neighbor-interesting-setting.png" alt="One interesting setting for k=2" width="401" height="401" class="size-full wp-image-66811" /></a><p class="wp-caption-text">One interesting setting for k=2</p></div>
<ul>
<li><a href="http://en.wikipedia.org/wiki/Voronoi_diagram">Voronoi diagram</a></li>
<li><a href="http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm">K-nearset neighbor</a></li>
<li><a href="http://en.wikipedia.org/wiki/K-means_clustering">k-means clustering</a></li>
</ul>
<p>The post <a href="http://martin-thoma.com/k-nearest-neighbor-clustering-interactive-example/">k-nearest-neighbor clustering &#8211; an interactive example</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://martin-thoma.com/k-nearest-neighbor-clustering-interactive-example/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Collatz sequence</title>
		<link>http://martin-thoma.com/the-collatz-sequence/</link>
		<comments>http://martin-thoma.com/the-collatz-sequence/#comments</comments>
		<pubDate>Thu, 16 May 2013 21:58:45 +0000</pubDate>
		<dc:creator>Martin Thoma</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[mathematics]]></category>
		<category><![CDATA[Project Euler]]></category>
		<category><![CDATA[Visualization]]></category>

		<guid isPermaLink="false">http://martin-thoma.com/?p=66141</guid>
		<description><![CDATA[<p>The goal of this post is to show you some tools that allow you to visualize data. And I also want to analyze some basic characteristics of the Collatz sequence. The Collatz sequences of a number is defined like this: So the sequence is defined as: You can define a directed graph like this: I [...]</p><p>The post <a href="http://martin-thoma.com/the-collatz-sequence/">The Collatz sequence</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></description>
				<content:encoded><![CDATA[<p>The goal of this post is to show you some tools that allow you to visualize data. And I also want to analyze some basic characteristics of the Collatz sequence.</p>
<p>The Collatz sequences \((c^n_i)\) of a number \(n \in \mathbb{N}_{> 0}\) is defined like this:</p>
\(f:\mathbb{N}_{>0} \rightarrow \mathbb{N}_{> 0}\;\;\;\;f(n) := \begin{cases}<br />
\frac{n}{2}   &#038; \text{if } n \text{ is even}\\<br />
3 \cdot n + 1 &#038; \text{if } n \text{ is odd}<br />
\end{cases}\)
<p>So the sequence \((c^n_{i})\) is defined as:</p>
\(c^n_{i} := \begin{cases}<br />
n   &#038; \text{if } i = 0\\<br />
f(c^n_i) &#038; \text{otherwise}<br />
\end{cases}\)
<p>You can define a directed graph \(G=(V, E)\) like this:</p>
\(V = \mathbb{N}_{>0}, \;\;\; E = \{(n, f(n)) | n \in V\}\)
<p>I will call this the <em>Collatz graph</em>.</p>
<h2>Collatz conjecture</h2>
<div class="important">
\(\forall_{n \in \mathbb{N}_{>0}} \exists_{i \in \mathbb{N}_{>0}}: c^n_i = 1\)
</div>
<p>The Collatz conjecture was not (dis)proved by now. This is astonishing, as it was proposed in 1937 and I think it is very easy to understand.</p>
<p>We also don&#8217;t know if the Collatz graph is connected. When it is not connected, it could be that one sequence \((c^n_i)\) goes to infinity or that there is another circle (\(4,2,1,4\) is a circle in the Collatz graph).</p>
<h2>Small \(n\)</h2>
<p>When you go through all possible Collatz sequences with \(n \in 1, \dots, 15\), this is what you get:</p>
<div id="attachment_66201" class="wp-caption aligncenter" style="width: 522px"><a href="http://martin-thoma.com/wp-content/uploads/2013/05/collatz-graph.png"><img src="http://martin-thoma.com/wp-content/uploads/2013/05/collatz-graph.png" alt="A graph for all Collatz sequences \((c^n_i)\) with \(n\leq15\)" width="512" height="410" class="size-full wp-image-66201" /></a><p class="wp-caption-text">A graph for all Collatz sequences [latex](c^n_i)[/latex] with [latex]n\leq15[/latex]</p></div>
<p>This image was created with the following Python script:</p>
<pre class="brush: python; collapse: true; light: false; title: ; toolbar: true; notranslate">
#!/usr/bin/env python
# -*- coding: utf-8 -*-

# Based on: http://en.wikipedia.org/wiki/File:Collatz-graph-all-30-no27.svg

def f(n):
    if n % 2 == 0:
        return n / 2
    else:
        return 3*n + 1

def writeDotfile(filename, limit, explored):
    dotfile = file(filename, 'w')

    dotfile.write('digraph {\n')
    dotfile.write('node[style=filled,color=&quot;.7 .3 1.0&quot;];\n')
    dotfile.write('1\n')
    dotfile.write('node[style=filled,color=&quot;.95 .1 1&quot;];\n')
    #dotfile.write('size=&quot;15,8&quot;;\n')

    for n in range(2, limit):
        while n not in explored:
            dotfile.write(str(n) + ' -&gt; ')
            explored.add(n)
            n = f(n)
        dotfile.write(str(n) + ';\n')
    dotfile.write('}\n')

def createPng(dotfile, base, program):
    import os
    command = program + &quot; -Tsvg &quot; + dotfile + &quot; -o &quot; + base + &quot;.svg&quot;
    print(&quot;Execute command: %s&quot; % command)
    os.system(command)

    command = &quot;inkscape &quot;+base+&quot;.svg&quot;+&quot; -w 512 --export-png=&quot;+base+&quot;.png&quot;
    print(&quot;Execute command: %s&quot; % command)
    os.system(command)

if __name__ == &quot;__main__&quot;:
    import argparse
 
    parser = argparse.ArgumentParser(
        description=&quot;Graph for small Collatz sequences&quot;
    )
    parser.add_argument(&quot;-f&quot;, &quot;--file&quot;, dest=&quot;filename&quot;,
                        default=&quot;collatz-graph.gv&quot;,
                        help=&quot;write dot-FILE&quot;, metavar=&quot;FILE&quot;)
    parser.add_argument(&quot;-p&quot;, &quot;--program&quot;, dest=&quot;program&quot;,
                  help=&quot;dot, neato, twopi, circo, fdp, sfdp, osage&quot;, 
                  metavar=&quot;PROGRAM&quot;, default=&quot;dot&quot;)
    parser.add_argument(&quot;-n&quot;, 
                      dest=&quot;limit&quot;, default=20, type=int, 
                      help=&quot;limit&quot;)
    args = parser.parse_args()
    
    writeDotfile(args.filename, args.limit, set([1]))
    import os
    createPng(args.filename, os.path.splitext(args.filename)[0], args.program)
</pre>
<p>called like this:</p>
<pre class="brush: bash; title: ; notranslate">python small-numbers.py -n 15 -p fdp</pre>
<h2>\(n=27\)</h2>
<p>\(n=27\) is an enourmously long sequence:</p>
<div id="attachment_66351" class="wp-caption aligncenter" style="width: 522px"><a href="http://martin-thoma.com/wp-content/uploads/2013/05/collatz-27.png"><img src="http://martin-thoma.com/wp-content/uploads/2013/05/collatz-27.png" alt="Collatz sequence \(c^{27}_i\)" width="512" height="227" class="size-full wp-image-66351" /></a><p class="wp-caption-text">Collatz sequence [latex]c^{27}_i[/latex]</p></div>
<p>It was created with pgfplots:</p>
<pre class="brush: plain; collapse: true; light: false; title: ; toolbar: true; notranslate">
\documentclass[varwidth=true, border=2pt]{standalone}
\usepackage[margin=2.5cm]{geometry} %layout

\usepackage{pgfplots}

\begin{document}
\begin{tikzpicture}
    \begin{axis}[
            axis x line=middle,
            axis y line=middle,
            enlarge y limits=true,
            scaled y ticks = false,
            width=15cm, height=8cm, % size of the image
            grid = major,
            grid style={dashed, gray!30},
            ylabel=$c^{27}_i$,
            xlabel=$i$,
            legend style={at={(0.1,-0.1)}, anchor=north}
         ]
          \addplot[sharp plot, mark=x, blue] table [x=steps, y=n, col sep=comma] {../collatz27.csv};
    \end{axis}
\end{tikzpicture}
\end{document}
</pre>
<h2>How long are Collatz sequences?</h2>
<p>I&#8217;ve been interested in the question how long Collatz sequences are. Of course, they will be longer when \(n\) is bigger. But how does the choice of \(n\) influence the number of steps it takes until you reach \(c^n_i = 1\)?</p>
<p>I&#8217;ve tested all Collatz sequences with \(n \leq 10,000,000\). This is the result:</p>
<div id="attachment_66231" class="wp-caption aligncenter" style="width: 522px"><a href="http://martin-thoma.com/wp-content/uploads/2013/05/collatz-sequence-steps.png"><img src="http://martin-thoma.com/wp-content/uploads/2013/05/collatz-sequence-steps.png" alt="Collatz sequence steps" width="512" height="512" class="size-full wp-image-66231" /></a><p class="wp-caption-text">Collatz sequence steps</p></div>
<p>For every hexagon, you check how many datapoints \((n,steps)\) you have there. This leads to the count. As you can see, step numbers from 50-120 are very common, the rest is very uncommon. The number of steps increases very slow.</p>
<p>The data was created as a 116.9 MB csv file with this C++ code:</p>
<pre class="brush: cpp; collapse: true; light: false; title: ; toolbar: true; notranslate">
#include &lt;iostream&gt;
#include &lt;string&gt;
#include &lt;map&gt;
#include &lt;vector&gt;
#include &lt;climits&gt; // get maximum value of unsigned long long
#include &lt;cstdlib&gt; // exit

#define SURPRESS_OUTPUT true
#define SHOW_DICT_CREATION false
 
using namespace std;

struct element {
    /** What is the next collatz number? */
    unsigned long long next;
    
    /** How many steps does it take until you reach 1? */
    unsigned long long steps;
};

map&lt;unsigned long long, struct element&gt; collatz;

unsigned long long CRITICAL_VALUE = (ULLONG_MAX-1) / 3;

unsigned long long maxAddFromOneEntry = 0;
unsigned long long maxEntry = 0;
unsigned long long maxStepsToOne = 0;
unsigned long long saveULong = 0;

/** n &gt;= 1 */
unsigned long long nextCollatz(unsigned long long n) {
    if (n%2 == 0) {
        return n/2;
    } else {
        if (n &gt;= CRITICAL_VALUE) {
            cerr &lt;&lt; &quot;Critical value is: &quot; &lt;&lt; CRITICAL_VALUE &lt;&lt; endl;
            cerr &lt;&lt; &quot;n is: &quot; &lt;&lt; n &lt;&lt; endl;
            cerr &lt;&lt; &quot;saveULong is: &quot; &lt;&lt; saveULong &lt;&lt; endl;
            exit(1);
        }
        return 3*n+1;
    }
}

void insertCollatz(unsigned long long i){
    if (collatz.find(i) == collatz.end()) {
        if (SHOW_DICT_CREATION &amp;&amp; !SURPRESS_OUTPUT) {
            cout &lt;&lt; i &lt;&lt; &quot; is not in collatz:&quot; &lt;&lt; endl;
        }

        // i is not in collatz
        vector&lt;unsigned long long&gt; steps;
        unsigned long long current = i;
        unsigned long long next = nextCollatz(current);
        while(collatz.find(current) == collatz.end()) {
            steps.push_back(current);
            current = next;
            next = nextCollatz(current);
        }

        if (steps.size() &gt; maxAddFromOneEntry) {
            maxAddFromOneEntry = steps.size();
        }

        vector&lt;unsigned long long&gt;::reverse_iterator it;
        for(it=steps.rbegin(); it != steps.rend(); it++){
            struct element el;
            el.next = current;
            el.steps = collatz[current].steps + 1;
            collatz[*it] = el;

            if (el.steps &gt; maxStepsToOne) {
                maxStepsToOne = el.steps;
            }

            if (*it &gt; maxEntry) {
                maxEntry = *it;
            }

            current = *it;

            if (SHOW_DICT_CREATION &amp;&amp; !SURPRESS_OUTPUT) {
                cout &lt;&lt; &quot;\tinserted &quot; &lt;&lt; *it &lt;&lt; &quot;-&gt;&quot; &lt;&lt; el.next &lt;&lt; endl;
            }
        }

        return;
    } else if (SHOW_DICT_CREATION &amp;&amp; !SURPRESS_OUTPUT) {
        cout &lt;&lt; i &lt;&lt; &quot; was already in collatz.&quot; &lt;&lt; endl;
    }
}

void printCollatz() {
    for(map&lt;unsigned long long, struct element&gt;::iterator it=collatz.begin(); 
        it!=collatz.end(); ++it) {
        unsigned long long next = (*it).first;
        while(next != 1) {
            cout &lt;&lt; next &lt;&lt; &quot;-&gt;&quot;;
            next = collatz[next].next;
        }
        cout &lt;&lt; 1 &lt;&lt; endl;
    }
}

void printSteps(unsigned long long max) {
    cout &lt;&lt; &quot;n,steps&quot; &lt;&lt; endl;
    for(unsigned long long i=1;i&lt;=max;i++) {
        cout &lt;&lt; i &lt;&lt; &quot;,&quot; &lt;&lt; collatz[i].steps &lt;&lt; endl;
    }
}

int main(int argc, char* argv[]) {
    struct element e;
    e.next = 4;
    e.steps = 0;
    collatz[1] = e;

    unsigned long long maxCollatz = (unsigned long long) atoi(argv[1]);
 
    for (unsigned long long i = 2; i &lt;= maxCollatz; i++) {
        insertCollatz(i);
        saveULong = i;
        if (i % 1000000 == 0) {
            cerr &lt;&lt; i &lt;&lt; endl;
        }
    }

    cerr &lt;&lt; &quot;maxAddFromOneEntry: &quot; &lt;&lt; maxAddFromOneEntry &lt;&lt; endl;
    cerr &lt;&lt; &quot;maxStepsToOne: &quot; &lt;&lt; maxStepsToOne &lt;&lt; endl;
    cerr &lt;&lt; &quot;maxEntry: &quot; &lt;&lt; maxEntry &lt;&lt; endl;
    cerr &lt;&lt; &quot;entries: &quot; &lt;&lt; collatz.size() &lt;&lt; endl;
    
    //printCollatz();
    printSteps(maxCollatz);

    return 0;
}
</pre>
<p>Then I&#8217;ve processed it with R:</p>
<pre class="brush: bash; title: ; notranslate">R -f analyze.R</pre>
<p>analyze.R:</p>
<pre class="brush: plain; collapse: true; light: false; title: ; toolbar: true; notranslate">
library(ggplot2)
memory.limit(4000)

mydata = read.csv(&quot;/home/moose/Downloads/algorithms/collatz/steps.csv&quot;)

# Prepare data
p&lt;-ggplot(mydata, aes ( x=n,y=steps ))

p&lt;-p + geom_hex(bins=30)
p&lt;-p + opts(panel.background = theme_rect(fill='white', colour='white'))

# This will save the result in a pdf file called Rplots.pdf
p
</pre>
<p>And finally, I&#8217;ve converted it to png:</p>
<pre class="brush: bash; title: ; notranslate">inkscape Rplots.pdf -w 512 --export-png=collatz-sequence-steps.png</pre>
<p>I&#8217;ve explained this a bit more detailed on <a href="http://tex.stackexchange.com/a/114577/5645">StackExchange</a>.</p>
<h2>Maximum in sequence</h2>
<p>In the following plot you can see \(n \in 1, \dots, 10,000,000\) on the \(x\)-axis and the maximum \(y = \max(\{a^n_i | i \in \mathbb{N}_{> 0}\})\):</p>
<div id="attachment_66391" class="wp-caption aligncenter" style="width: 522px"><a href="http://martin-thoma.com/wp-content/uploads/2013/05/maxInSequence.png"><img src="http://martin-thoma.com/wp-content/uploads/2013/05/maxInSequence.png" alt="Hexagonal binpacking plot for maximum in sequence" width="512" height="512" class="size-full wp-image-66391" /></a><p class="wp-caption-text">Hexagonal binpacking plot for maximum in sequence</p></div>
<pre class="brush: plain; collapse: true; light: false; title: ; toolbar: true; notranslate">
library(ggplot2)

mydata = read.csv(&quot;../collatz-maxNumber.csv&quot;)

# Prepare data
p&lt;-ggplot(mydata, aes(x=n, y=maximum)) + scale_y_log10()

p&lt;-p + geom_hex(bins=50)
p&lt;-p + opts(panel.background = theme_rect(fill='white', colour='white'))

# This will save the result in a pdf file called Rplots.pdf
p
</pre>
<h2>Execution times</h2>
<p>Generating all Collatz sequences up to 10,000,000 items took about 50 seconds. But R needed about 10 minutes to generate images from that.</p>
<p>Inkscape didn&#8217;t like the heavy plot:</p>
<pre class="brush: bash; title: ; notranslate">moose@pc07$ inkscape Rplots.pdf -w 512 --export-png=maxInSequence.png

(inkscape:26733): GLib-ERROR **: /build/buildd/glib2.0-2.34.1/./glib/gmem.c:165: failed to allocate 3440640 bytes
^CTrace/breakpoint trap (core dumped)
</pre>
<h2>Maximum in sequence and steps</h2>
<div id="attachment_66481" class="wp-caption aligncenter" style="width: 522px"><a href="http://martin-thoma.com/wp-content/uploads/2013/05/collatz-sequence-and-steps-for-n.png"><img src="http://martin-thoma.com/wp-content/uploads/2013/05/collatz-sequence-and-steps-for-n.png" alt="Maximum value and number of steps for n up to 10,000" width="512" height="512" class="size-full wp-image-66481" /></a><p class="wp-caption-text">Maximum value and number of steps for n up to 10,000</p></div>
<h2>Read more</h2>
<ul>
<li><a href="https://github.com/MartinThoma/algorithms/tree/master/collatz">All sources</a> of this article are on GitHub</li>
<li><a href="http://www.graphviz.org/Documentation/dotguide.pdf">Dot guide</a>, <a href="http://www.graphviz.org/doc/info/shapes.html">Node shapes</a></li>
<li><a href="http://wiki.ubuntuusers.de/R">R on UbuntuUsers</a> (German)</li>
<li><a href="http://projecteuler.net/problem=14">Project Euler 14</a></li>
</ul>
<p>The post <a href="http://martin-thoma.com/the-collatz-sequence/">The Collatz sequence</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://martin-thoma.com/the-collatz-sequence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Maps in C++</title>
		<link>http://martin-thoma.com/maps-in-cpp/</link>
		<comments>http://martin-thoma.com/maps-in-cpp/#comments</comments>
		<pubDate>Thu, 16 May 2013 06:52:54 +0000</pubDate>
		<dc:creator>Martin Thoma</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[datastructure]]></category>
		<category><![CDATA[map]]></category>
		<category><![CDATA[STL]]></category>

		<guid isPermaLink="false">http://martin-thoma.com/?p=66091</guid>
		<description><![CDATA[<p>Maps are one of the most useful datastructures in C++ and there is no excuse for not knowing it. Here is a basic example that shows how you can use it: See also C++ Reference: general information and example Map is ordered collection (source)</p><p>The post <a href="http://martin-thoma.com/maps-in-cpp/">Maps in C++</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></description>
				<content:encoded><![CDATA[<p>Maps are one of the most useful datastructures in C++ and there is no excuse for not knowing it.</p>
<p>Here is a basic example that shows how you can use it:</p>
<pre class="brush: cpp; title: ; notranslate">
#include &lt;iostream&gt; // cout
#include &lt;string&gt;

#include &lt;map&gt;

using namespace std;

int main() {
    map&lt;string, string&gt; phonebook;

    // Put some stuff in it
    phonebook[&quot;Martin&quot;] = &quot;(0123) 45 678&quot;;
    phonebook[&quot;Alice&quot;] = &quot;+(13) 37 0000&quot;;
    phonebook[&quot;Bob&quot;]   = &quot;+(13) 37 0000&quot;;
    phonebook[&quot;Charlie&quot;] = &quot;Alice&quot;;

    // Look stuff up
    cout &lt;&lt; &quot;The phone number of Alice is &quot; 
         &lt;&lt; phonebook[&quot;Alice&quot;] &lt;&lt; endl;

    cout &lt;&lt; &quot;Number of phone book entries: &quot;
         &lt;&lt; phonebook.size() &lt;&lt; endl;

    // Print everything
    cout &lt;&lt; &quot;Iterate over all phonebook entries: &quot; &lt;&lt; endl;
    for(map&lt;string,string&gt;::iterator it=phonebook.begin(); 
        it!=phonebook.end(); ++it) {
        cout &lt;&lt; &quot;\t&quot; &lt;&lt; (*it).first &lt;&lt; &quot;: &quot; &lt;&lt; (*it).second &lt;&lt; endl;
    }

    // Check if entry exists:
    string person = &quot;Bob&quot;;
    cout &lt;&lt; &quot;Does &quot; &lt;&lt; person &lt;&lt; &quot; have a phone number ?&quot; &lt;&lt; endl;
    map&lt;string,string&gt;::iterator it = phonebook.find(person);
    if(it != phonebook.end()) {
        //element found:
        cout &lt;&lt; &quot;\t&quot; &lt;&lt; &quot;Yes! His number is: &quot; &lt;&lt; it-&gt;second &lt;&lt; endl;
    } else {
        cout &lt;&lt; &quot;\t&quot; &lt;&lt; &quot;No.&quot; &lt;&lt; endl;
    }
}
</pre>
<h2>See also</h2>
<ul>
<li>C++ Reference: <a href="http://www.cplusplus.com/reference/map/map/">general information</a> and <a href="http://www.cplusplus.com/reference/map/map/map/">example</a></li>
<li>Map is ordered collection (<a href="http://stackoverflow.com/a/4562771/562769">source</a>)</li>
</ul>
<p>The post <a href="http://martin-thoma.com/maps-in-cpp/">Maps in C++</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://martin-thoma.com/maps-in-cpp/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Code Jam – Round 1C 2013</title>
		<link>http://martin-thoma.com/google-code-jam-round-1c-2013/</link>
		<comments>http://martin-thoma.com/google-code-jam-round-1c-2013/#comments</comments>
		<pubDate>Sun, 12 May 2013 13:01:15 +0000</pubDate>
		<dc:creator>Martin Thoma</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Google Code Jam]]></category>

		<guid isPermaLink="false">http://martin-thoma.com/?p=65921</guid>
		<description><![CDATA[<p>Problem A (Consonants): Small Set: 4305/4834 users (89%) Large Set: 1551/3778 users (41%) Problem B (Pogo): Small Set: 2537/3129 users (81%) Large Set: 121/638 users (19%) Problem C (The Great Wall): Small Set: 934/1260 users (74%) Large Set: 74/330 users (22%) More information is on go-hero.net. Consonants A solution from nip: Pogo This is a [...]</p><p>The post <a href="http://martin-thoma.com/google-code-jam-round-1c-2013/">Google Code Jam – Round 1C 2013</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></description>
				<content:encoded><![CDATA[<ul>
<li>Problem A (<a href="https://code.google.com/codejam/contest/2437488/dashboard#s=p0">Consonants</a>):
<ul>
<li>Small Set: 4305/4834 users (89%)</li>
<li>Large Set: 1551/3778 users (41%)</li>
</ul>
<li>Problem B (<a href="https://code.google.com/codejam/contest/2437488/dashboard#s=p1">Pogo</a>):
<ul>
<li>Small Set: 2537/3129 users (81%)</li>
<li>Large Set: 121/638 users (19%)</li>
</ul>
</li>
<li>Problem C (<a href="https://code.google.com/codejam/contest/2437488/dashboard#s=p2">The Great Wall</a>):
<ul>
<li>Small Set: 934/1260 users (74%)</li>
<li>Large Set: 74/330 users (22%)</li>
</ul>
</li>
</ul>
<p>More information is on <a href="http://www.go-hero.net/jam/13/round/3">go-hero.net</a>.</p>
<h2>Consonants</h2>
<p>A solution from <a href="http://www.go-hero.net/jam/13/name/nip">nip</a>:</p>
<pre class="brush: python; title: ; notranslate">
#!/usr/bin/env python
# -*- coding: utf-8 -*-

def solve(s, n):
    vowels = {'a', 'e', 'i', 'o', 'u'}
    nvalue = 0
    count = 0 # how many consecutive consonants
    pos = -1 # position of the last substring of n consonants
    for i, c in enumerate(s):
        if c in vowels:
            count = 0
        else:
            count += 1
        if count &gt;= n:
            pos = i + 2 - n
        if pos &gt;= 0:
            nvalue += pos
    return nvalue
 
if __name__ == &quot;__main__&quot;:
    testcases = input()
      
    for caseNr in xrange(1, testcases+1):
        name, n = raw_input().split(&quot; &quot;)
        print(&quot;Case #%i: %s&quot; % (caseNr, solve(name, int(n))))
</pre>
<h2>Pogo</h2>
<p>This is a very clever solution from xiaowuc1 (translated from Java to Python).</p>
<p>The idea is to calculate at first the maximum number of steps you need and then go from your target destination to the origin.</p>
<p>How many steps do you need?<br />
In the \(i\) round, you will make \(i\) steps. You need at least \(x+y\) steps to get from \((0|0)\) to \((x|y)\). This means, you need to solve \(\sum_{i=1}^n i = x + y\) for \(n\). This is \(\frac{n^2 + n}{2} = x+y\). You might also need to make one extra step if the parity of \(\frac{n^2 + n}{2}\) is not the same as \(x+y\).<br />
You can calculate this with a simple loop (see code below).</p>
<p>After you know the maximum number of steps, you can apply a greedy solution: Start from \((x|y)\) and always go into the direction that is farer away from the origin.</p>
<pre class="brush: python; title: ; notranslate">
#!/usr/bin/env python
# -*- coding: utf-8 -*-

def calculateSteps(x, y):
    s = 0
    dist = abs(x) + abs(y)
    while (s**2 + s)/2 &lt; dist or ((s**2 + s)/2)%2 != dist%2:
        s += 1
    return s
 
def solve(x,y):
    &quot;&quot;&quot; starting at (0|0) and going i steps, 
        how can you reach (x|y)? &quot;&quot;&quot;   
    s = calculateSteps(x, y)
 
    solution = &quot;&quot;
    for i in range(s, 1-1,-1):
        if abs(x) &gt; abs(y):
            if x &gt; 0:
                solution += &quot;E&quot;
                x -= i
            else:
                solution += &quot;W&quot;
                x += i
        else:
            if y &gt; 0:
                solution += &quot;N&quot;
                y -= i
            else:
                solution += &quot;S&quot;
                y += i
    return solution[::-1]

if __name__ == &quot;__main__&quot;:
    testcases = input()
 
    for caseNr in xrange(1, testcases+1):
        x,y = raw_input().split(&quot; &quot;)
        x,y = int(x),int(y)
        print(&quot;Case #%i: %s&quot; % (caseNr, solve(x,y)))
</pre>
<h2>The Great Wall</h2>
<p>The following solution is not applicable for the large input set, but it works fine for the small one:</p>
<pre class="brush: python; title: ; notranslate">
#!/usr/bin/env python
# -*- coding: utf-8 -*-

from collections import defaultdict

def prepareTribes(tribes):
    tribeStack = []
    for tribe in tribes:
        for attackNumber in range(0, tribe[&quot;ni&quot;]):
            tribeStack.append({
                &quot;day&quot; :tribe[&quot;di&quot;]+attackNumber*tribe[&quot;delta_di&quot;],
                &quot;west&quot;:2*(tribe[&quot;wi&quot;]+attackNumber*tribe[&quot;delta_pi&quot;]),
                &quot;east&quot;:2*(tribe[&quot;ei&quot;]+attackNumber*tribe[&quot;delta_pi&quot;]),
                &quot;height&quot;:tribe[&quot;si&quot;]+attackNumber*tribe[&quot;delta_si&quot;]
            })
    return sorted(tribeStack, key=lambda tribe: tribe[&quot;day&quot;])

def runAttack(wall, tribe):
    increase = []
    for i in xrange(tribe[&quot;west&quot;], tribe[&quot;east&quot;] + 1):
        if wall[i] &lt; tribe[&quot;height&quot;]: # wall-ee
            increase.append({&quot;wallPos&quot; : i, &quot;height&quot; : tribe[&quot;height&quot;]})

    return increase

def solve(tribes):
    wall = defaultdict(int)
    tribeStack = prepareTribes(tribes)
    #for tribe in tribeStack:
    #    print tribe[&quot;day&quot;], &quot;[&quot; + str(tribe[&quot;west&quot;]) + &quot;,&quot; + str(tribe[&quot;east&quot;])+&quot;]&quot;, tribe[&quot;height&quot;]
    successes = 0
    increase = []
    for i, tribe in enumerate(tribeStack):
        increaseTmp = runAttack(wall, tribe)
        #print wall
        #print tribe
        if len(increaseTmp) &gt; 0:
            successes += 1

        increase += increaseTmp

        if i+1==len(tribeStack) or tribeStack[i+1][&quot;day&quot;] &gt; tribe[&quot;day&quot;]:
            for el in increase:
                if wall[el[&quot;wallPos&quot;]] &lt; el[&quot;height&quot;]:
                    wall[el[&quot;wallPos&quot;]] = el[&quot;height&quot;]
    return successes

if __name__ == &quot;__main__&quot;:
    testcases = input()
      
    for caseNr in xrange(1, testcases+1):
        N = input() # Number of tribes attacking the wall
        tribes = []
        for tribe in range(N):
            di, ni, wi, ei, si, delta_di, delta_pi, delta_si = raw_input().split(&quot; &quot;)
            tribes.append({&quot;di&quot;:int(di), # the day of the tribe's first attack
            &quot;ni&quot;: int(ni), # the number of attacks from this tribe
            &quot;wi&quot;: int(wi), # the westmost 
            &quot;ei&quot;: int(ei), # and eastmost points respectively of the Wall attacked on the first attack
            &quot;si&quot;: int(si), # the strength of the first attack
            &quot;delta_di&quot;: int(delta_di), # the number of days between subsequent attacks by this tribe
            &quot;delta_pi&quot;: int(delta_pi), # the distance this tribe travels to the east between subsequent attacks (if this is negative, the tribe travels to the west)
            &quot;delta_si&quot;: int(delta_si) # the change in strength between subsequent attacks
            })
        print(&quot;Case #%i: %s&quot; % (caseNr, solve(tribes)))
</pre>
<p>By the way, nobody has solved the large input set of this one with Python! But here is a <a href="http://www.go-hero.net/jam/13/name/eatmore">Java solution</a>.</p>
<p>The post <a href="http://martin-thoma.com/google-code-jam-round-1c-2013/">Google Code Jam – Round 1C 2013</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://martin-thoma.com/google-code-jam-round-1c-2013/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>How do hash functions work?</title>
		<link>http://martin-thoma.com/how-do-hash-functions-work/</link>
		<comments>http://martin-thoma.com/how-do-hash-functions-work/#comments</comments>
		<pubDate>Sat, 11 May 2013 18:07:07 +0000</pubDate>
		<dc:creator>Martin Thoma</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[C]]></category>
		<category><![CDATA[datastructure]]></category>
		<category><![CDATA[game tree]]></category>
		<category><![CDATA[hash]]></category>
		<category><![CDATA[hashCode]]></category>
		<category><![CDATA[Programming]]></category>

		<guid isPermaLink="false">http://martin-thoma.com/?p=65281</guid>
		<description><![CDATA[<p>Everybody who has written a noticeable amount of Java code should know the method hashCode(). But most beginners have difficulties to understand the significance of this little method. The following article gives you one small example with some impressions how much hash functions influence execution time. Connect four Connect Four [...] is a two-player game [...]</p><p>The post <a href="http://martin-thoma.com/how-do-hash-functions-work/">How do hash functions work?</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></description>
				<content:encoded><![CDATA[<p>Everybody who has written a noticeable amount of Java code should know the method <code><a href="http://docs.oracle.com/javase/7/docs/api/java/lang/Object.html#hashCode()">hashCode</a>()</code>. But most beginners have difficulties to understand the significance of this little method. The following article gives you one small example with some impressions how much hash functions influence execution time.</p>
<h2>Connect four</h2>
<blockquote><p>Connect Four [...] is a two-player game in which the players first choose a color and then take turns dropping colored discs from the top into a seven-column, six-row vertically-suspended grid. The pieces fall straight down, occupying the next available space within the column. The object of the game is to connect four of one&#8217;s own discs of the same color next to each other vertically, horizontally, or diagonally before your opponent.
</p></blockquote>
<p><small>Source: <a href="http://en.wikipedia.org/wiki/Connect_Four">Wikipedia</a></small></p>
<p>It looks like this:</p>
<div id="attachment_65311" class="wp-caption aligncenter" style="width: 330px"><a href="http://martin-thoma.com/wp-content/uploads/2013/05/connect-four.gif"><img src="http://martin-thoma.com/wp-content/uploads/2013/05/connect-four.gif" alt="Connect Four" width="320" height="190" class="size-full wp-image-65311" /></a><p class="wp-caption-text">Connect Four<br />Source: <a href="http://commons.wikimedia.org/wiki/File:Connect_Four.gif">commons.wikimedia.org</a></p></div>
<h2>The task</h2>
<p>Imagine you would like to find a good strategy where to drop your disk. A simple brute-force method is to create a so called <a href="http://en.wikipedia.org/wiki/Game_tree">game tree</a>. This means you go through each possibility at each situation that could occur in the game for both players. </p>
<p>This approach has generally two problems:</p>
<ol>
<li>You have to know how to go through each situation. For connect four it is easy. Both players place their disks in turns and in every turn the current player has at most 7 possibilities. But it is impossible for games like <a href="http://en.wikipedia.org/wiki/Calvinball#Calvinball">Calvinball</a> or <a href="http://en.wikipedia.org/wiki/Mao_(card_game)">Mao</a>.</li>
<li>The game tree might be HUGE. In this case, you can have \(4,531,985,219,092 \approx 4.5 \cdot 10^{12}\) game situations (<a href="http://math.stackexchange.com/a/301128/6876">source</a>). Even if you would need only one bit for each situation, it would require 566.5 GB!</li>
</ol>
<p>Anyway, lets say we want to store many unique game situations. Unique means, even if you have hundreds of possible paths to get to a given game situations, you will store this game situation only once.</p>
<h2>Implementation</h2>
<p>First of all, I would like to mention that you can <a href="#How_is_this_realated_to_hash_functions">skip the source code</a>. I&#8217;ve only included it to make it easier to understand what I&#8217;m talking about.</p>
<p>Lets say our game situation looks like this:</p>
<pre class="brush: cpp; title: ; notranslate">
struct gamesituation {
    /** How does the board currently look like? */
    char board[BOARD_WIDTH][BOARD_HEIGHT];

    /**
     * What are the next game situations that I can reach from this 
     * board? 
     * The next[i] means that the player dropped the disc at column i
     */
    int next[7];

    /* I could use a bitfield for this ... but it would make access
     * much more inconvenient. 
     */
    unsigned char isEmpty;  // Is this gamesitatution already filled?
    unsigned char isFinished; // Is this game finished?
    unsigned char stalemate; // Was this game a stalemate?
    unsigned char winRed;   // Did red win?
    unsigned char winBlack; // Did black win?
};
</pre>
<p>You need a check if one player won:</p>
<pre class="brush: cpp; title: ; notranslate">
/*
 * Check if player has won by placing a disc on (x,y). 
 * with direction (xDir, yDir)
 * @return 1 iff RED won, -1 iff BLACK won and 0 if nobody won
 */
signed char hasPlayerWon(char board[BOARD_WIDTH][BOARD_HEIGHT], 
                  int x, int y, char xDir, char yDir) {
    char color = board[x][y];

    int tokensInRow = getTokensInRow(board, color, x, y, xDir, yDir)
              + getTokensInRow(board, color, x, y, -xDir, -yDir) - 1;

    if (tokensInRow &gt;= WINNING_NR) {
        if (color == RED) {
            return 1;
        } else if (color == BLACK) {
            return -1;
        } else {
            perror(&quot;this color doesn't / shouldn't exist\n&quot;);
            exit(1);
        }
    }

    return 0;
}

/* 
 * A new disc has been dropped. Check if this disc means that 
 * somebody won.
 * @return 1 iff RED won, -1 iff BLACK won, otherwise NOT_FINISHED
 */
int isBoardFinished(char board[BOARD_WIDTH][BOARD_HEIGHT], 
                    int x, int y) {
    signed char status;

    // check left-right
    status = hasPlayerWon(board, x, y, 1, 0);

    if (status != 0) {
        return status;
    }

    // top-down
    status = hasPlayerWon(board, x, y, 0, 1);

    if (status != 0) {
        return status;
    }

    // down-left to top-right
    status = hasPlayerWon(board, x, y, 1, 1);

    if (status != 0) {
        return status;
    }

    // top-left to down-right
    status = hasPlayerWon(board, x, y, -1, 1);

    if (status != 0) {
        return status;
    }

    return NOT_FINISHED;
}
</pre>
<p>If you need an explanation for this, you should read <a href="http://martin-thoma.com/check-x-in-a-row-for-board-games/" title="Check x-in-a-row for board games">this article</a>.</p>
<p>And you need a function that can mirror boards (to get rid of identical, but mirrored situations) and one that can compare boards:</p>
<pre class="brush: cpp; title: ; notranslate">
char isSameBoard(char a[BOARD_WIDTH][BOARD_HEIGHT], 
                 char b[BOARD_WIDTH][BOARD_HEIGHT]) {
    for (int x = 0; x &lt; BOARD_WIDTH; x++) {
        for (int y = 0; y &lt; BOARD_HEIGHT; y++) {
            if (a[x][y] != b[x][y]) {
                return FALSE;
            }
        }
    }

    return TRUE;
}

void mirrorBoard(char board[BOARD_WIDTH][BOARD_HEIGHT],
                 char newBoard[BOARD_WIDTH][BOARD_HEIGHT]) {
    for (int x = 0; x &lt; BOARD_WIDTH; x++) {
        for (int y = 0; y &lt; BOARD_HEIGHT; y++) {
            newBoard[BOARD_WIDTH - x - 1][y] = board[x][y];
        }
    }
}
</pre>
<p>You need a function that makes all possible moves for the players:</p>
<pre class="brush: cpp; title: ; notranslate">

/*
 * Make all possible turns that the player can make in this
 * game situation.
 */
void makeTurns(char board[BOARD_WIDTH][BOARD_HEIGHT], 
   char currentPlayer, unsigned int lastId, int recursion) {
    unsigned int insertID;
    int outcome;

    for (int column = 0; column &lt; BOARD_WIDTH; column++) {
        // add to column
        int height = BOARD_HEIGHT - 1;

        // the disc falls down
        while (height &gt;= 0 &amp;&amp; board[column][height] == EMPTY) {
            height--;
        }
        height++;

        // this colum is full
        if (height == 6) {
            continue;
        }

        // place disc
        board[column][height] = currentPlayer;

        if (didBoardAlreadyOccur(board)) {
            // I've already got to this situation
            insertID = getBoardIndex(board);
            savePreviousID(insertID, lastId, column);
        } else {
            char mirrored[BOARD_WIDTH][BOARD_HEIGHT];
            mirrorBoard(board, mirrored);

            if (didBoardAlreadyOccur(mirrored)) {
                // I've already got this situation, but mirrored
                // so take care of symmetry at this point
                mirroredCounter++;
                insertID = getBoardIndex(mirrored);
                savePreviousID(insertID, lastId, column);
            } else {
                registeredSituations++;

                if (registeredSituations == MAXIMUM_SITUATIONS) {
                    giveCurrentInformation();
                    exit(MAXIMUM_SITUATIONS_REACHED_EXIT_STATUS);
                }

                if (REGISTERED_MOD &gt; 0 &amp;&amp;
                    registeredSituations % REGISTERED_MOD == 0) {
                    giveCurrentInformation();
                }

                outcome = isBoardFinished(board, column, height);

                if (ABS(outcome) &lt;= 1) {
                    // the game is finished
                    insertID = getNewIndex(board);
                    storeToDatabase(insertID, board, TRUE, outcome);
                    savePreviousID(insertID, lastId, column);
                } else {
                    // Switch players
                    if (currentPlayer == RED) {
                        currentPlayer = BLACK;
                    } else {
                        currentPlayer = RED;
                    }

                    insertID = getNewIndex(board);
                    setBoard(insertID, board);
                    savePreviousID(insertID, lastId, column);
                    char copy[BOARD_WIDTH][BOARD_HEIGHT];

                    for (int x = 0; x &lt; BOARD_WIDTH; x++) {
                        for (int y = 0; y &lt; BOARD_HEIGHT; y++) {
                            copy[x][y] = board[x][y];
                        }
                    }

                    makeTurns(copy, currentPlayer, insertID, 
                              recursion + 1);
                }
            }
        }
    }
}
</pre>
<h2>How is this realated to hash functions?</h2>
<p>You might have noticed a few functions that I didn&#8217;t explain by now:</p>
<ul>
<li><code>didBoardAlreadyOccur(board)</code>: Checks if a given board is stored in database.</li>
<li><code>getBoardIndex(board)</code>: This is a function that takes a board and gives a non-negative integer which is characteristic for the given board.</li>
<li><code>savePreviousID(insertID, lastId, column)</code>: store insertID as a possible next situation for lastId in database</li>
<li><code>setBoard(insertID, board)</code>: Insert board into database at position insertID</li>
</ul>
<p>How would you implement <code>didBoardAlreadyOccur(board)</code>? This function (or insertID) will be the slowest part of the code and will be called VERY often. So it needs to be as fast as possible.</p>
<h2>A hash function</h2>
<p>Most of the time you can create hash functions by mapping values to integers. In my case, I mapped the board &#8211; which is a two-dimensional char array &#8211; to one integer by thinking of it as a very long number. I think of a red disc as the digit 1, a black disc as the digit 2 and an empty field as 0:</p>
<pre class="brush: cpp; title: ; notranslate">
unsigned int charToInt(char x) {
    if (x == RED) {
        return 1;
    } else if (x == BLACK) {
        return 2;
    } else {
        return 0;
    }
}
</pre>
<p>When you want to get the board number, you can get it like this:<br />
<div id="attachment_65761" class="wp-caption aligncenter" style="width: 330px"><a href="http://martin-thoma.com/wp-content/uploads/2013/05/connect-four-to-number.png"><img src="http://martin-thoma.com/wp-content/uploads/2013/05/connect-four-to-number.png" alt="Empty=0, red=1, yellow=2" width="320" height="380" class="size-full wp-image-65761" /></a><p class="wp-caption-text">Empty=0, red=1, yellow=2<br />The board number is 00000000000000000210000211210112212</p></div></p>
<p>For most game situations, this number will be much too big to store it in an integer. Also, we would like to get an index for our array so that we know where to store this board. The simplest solution to this problem is to calculate <code>NUMBER % ARRAY_SIZE</code>:</p>
<pre class="brush: cpp; title: ; notranslate">unsigned int getFirstIndex(char board[BOARD_WIDTH][BOARD_HEIGHT]) {
    unsigned int index = 0;

    for (int x = 0; x &lt; BOARD_WIDTH; x++) {
        for (int y = 0; y &lt; BOARD_HEIGHT; y++) {
            index += charToInt(board[x][y]) * 
                     myPow(3, ((x + y * BOARD_WIDTH) % HASH_MODULO));
        }
    }

    index = index % MAXIMUM_SITUATIONS;
    return index;
}</pre>
<p>The function <code>getFirstIndex</code> maps an char Array with BOARD_WIDTH * BOARD_HEIGHT = 7 * 6 = 42 elements to an integer interval [0, MAXIMUM_SITUATIONS] = [0, 20000000]. Although I only use three values for the char array, that is \(3^{42} = 109418989131512359209 \approx 1.09 \cdot 10^{20}\). There are many game situation numbers that can never occur (e.g. two more red than black dists), but we still map a significantly larger space to [0,20000000]. You can&#8217;t change that. You can probably find (much) better mappings, but as we know that there are \(4.5 \cdot 10^{12}\) game situations, you will always have the problem that your codomain is much smaller than the domain of your hash function. That&#8217;s a fundamental problem of hash functions.</p>
<p>This means, you will have two board situations that map to the same hash number. This is called a &#8220;hash collision&#8221;. When you use the hash number directly as an index for your board, you will have to deal with hash collisions. Some solutions are:</p>
<ul>
<li>Ignoring the problem: That&#8217;s boring and not always possible (but simple).</li>
<li>Linear probing</li>
<li>Quadratic probing</li>
</ul>
<h2>Linear probing</h2>
<p>The idea of linear probing is very simple: </p>
<p>Inserting a new item:</p>
<ol>
<li>You look at the index \(i\) that your hash function gave you.</li>
<li>If this index is already full, you look at \(i+1\)</li>
<li>When you took a look at all slots of your array, you can&#8217;t insert the new item.</li>
</ol>
<p>Searching for an already inserted item:</p>
<ol>
<li>You look at the index \(i\) that your hash function gave you.</li>
<li>If \(i\) is empty, you&#8217;re ready. The item was not inserted.</li>
<li>If \(i\) is not the item you&#8217;ve searched for, you have to look at \(i+1\).</li>
<li>Keep looking at the next item until you find your searched item, you&#8217;ve looked at all items or you find an empty slot.</li>
</ol>
<p>Deleting is complicated. You have to look at all items after the deleted one, remove them from your array and insert them again. That&#8217;s not good.</p>
<p>The problem of linear probing is <strong>clustering</strong>. When you have some hash values that are close together, you might get hash collisions faster. When you&#8217;ve got your first collisions, you resolve them by inserting the value close to the value where you originally wanted to save it. So you get one big cluster quite fast. When you want to insert an element in the cluster, you first have to search the end of the cluster. That&#8217;s bad for performance.</p>
<p>An advantage of linear probing compared to quadratic probing is that you might get a better performance due to cache effects.</p>
<h2>Quadratic probing</h2>
<p>The idea of quadratic probing is the same as for linear probing, but you try to fix the clustering-problem by using a clever way to search for a free spot:</p>
\(h_i(x) = \left(h(x) + (-1)^{i+1} \cdot \left\lceil\frac{i}{2}\right\rceil^2\right) \bmod~m\)
<p>where \(h\) is your hash function and \(i\) is your i-th try to find a free spot while you have \(m\) spots in total.</p>
<p>This one also suffers from clustering, but it&#8217;s not as bad as with linear clustering.</p>
<h2>Double hashing</h2>
<p>This solution could be the best one, but also the hardest one to implement correctly. You could find a free spot by using a second hash function \(h&#8217;\) like this:</p>
\(h_i(x) = (h(x)+h&#8217;(x)\cdot i) ~ \bmod ~ m\)
<p>BUT you have to make sure that \(Pr[h(x)=h(y) \land h'(x)=h'(y)] = \frac{1}{m^2}\)</p>
<h2>Performance</h2>
<p>You can use linear probing, quadratic probing and double hashing in my example and measure how many game situations get stored. The more game situations you can store in the same amount of time, the better:</p>
<div id="attachment_65871" class="wp-caption aligncenter" style="width: 522px"><a href="http://martin-thoma.com/wp-content/uploads/2013/05/connectfour-probing.png"><img src="http://martin-thoma.com/wp-content/uploads/2013/05/connectfour-probing.png" alt="Linear probing, quadratic probing and double hashing for connect four" width="512" height="334" class="size-full wp-image-65871" /></a><p class="wp-caption-text">Linear probing, quadratic probing and double hashing for connect four</p></div>
<p>You can see that linear probing performs much worse than quadratic probing and double hashing. When you compare quadratic probing with double hashing, there seems not to be a big difference. But note that my second hash function is almost the same as the first one. You could probably choose a better second hash function and get better results (suggestions are welcome).</p>
<h2>Why are hash functions important?</h2>
<p>Hash functions help you to map a big amount of data to a small space. They are important, because they are a relevant part of many datastructures. The better they are, the faster will operations on those datastructures work. Better means: Faster to compute or less collisions.</p>
<p>Some datastructures like this are:</p>
<ul>
<li><a href="http://docs.oracle.com/javase/7/docs/api/java/util/HashMap.html">HashMap</a> and <a href="http://docs.oracle.com/javase/7/docs/api/java/util/Hashtable.html">HashTable</a> (→ <a href="http://stackoverflow.com/a/40878/562769">difference</a>)</li>
<li><a href="http://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html">HashSet</a></li>
</ul>
<h2>Final notes</h2>
<p>Another resolution for hash collisions is creating a linked list. This means you will not suffer from clustering and you can insert in \(\mathcal{O}(1)\). But searching for an element is still in \(\mathcal{O}(n)\), where \(n\) is the number of elements that were already inserted.</p>
<h2>Resources</h2>
<p>You can find the <a href="https://github.com/MartinThoma/connect-four/tree/master/C">code at GitHub</a>.</p>
<p>The post <a href="http://martin-thoma.com/how-do-hash-functions-work/">How do hash functions work?</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://martin-thoma.com/how-do-hash-functions-work/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Code Jam – Round 1B 2013</title>
		<link>http://martin-thoma.com/google-code-jam-round-1b-2013/</link>
		<comments>http://martin-thoma.com/google-code-jam-round-1b-2013/#comments</comments>
		<pubDate>Sun, 05 May 2013 14:15:32 +0000</pubDate>
		<dc:creator>Martin Thoma</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Google Code Jam]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://martin-thoma.com/?p=65421</guid>
		<description><![CDATA[<p>Problem A (Osmos): Small Set: 4668/7250 users (64%) Large Set: 3537/4578 users (77%) Problem B (Falling Diamonds): Small Set: 952/1882 users (51%) Large Set: 525/724 users (73%) Problem C (Garbled Email): Small Set: 444/896 users (50%) Large Set: 255/345 users (74%) More information are on go-hero.net. Osmos Falling Diamonds Oncee you&#8217;ve read the task, you [...]</p><p>The post <a href="http://martin-thoma.com/google-code-jam-round-1b-2013/">Google Code Jam – Round 1B 2013</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></description>
				<content:encoded><![CDATA[<ul>
<li>Problem A (<a href="https://code.google.com/codejam/contest/2434486/dashboard#s=p0">Osmos</a>):
<ul>
<li>Small Set: 4668/7250 users (64%)</li>
<li>Large Set: 3537/4578 users (77%)</li>
</ul>
<li>Problem B (<a href="https://code.google.com/codejam/contest/2434486/dashboard#s=p1">Falling Diamonds</a>):
<ul>
<li>Small Set: 952/1882 users (51%)</li>
<li>Large Set: 525/724 users (73%)</li>
</ul>
</li>
<li>Problem C (<a href="https://code.google.com/codejam/contest/2434486/dashboard#s=p2">Garbled Email</a>):
<ul>
<li>Small Set: 444/896 users (50%)</li>
<li>Large Set: 255/345 users (74%)</li>
</ul>
</li>
</ul>
<p>More information are on <a href="http://www.go-hero.net/jam/13/round/2">go-hero.net</a>.</p>
<h2>Osmos</h2>
<pre class="brush: python; title: ; notranslate">
#!/usr/bin/env python
# -*- coding: utf-8 -*-

def howBigDoIget(A, motes):
    while len(motes) &gt; 0 and A &gt; min(motes):
        A += min(motes)
        motes.remove(min(motes))
    return A

def stepsNeededForNext(A, motes):
    m = min(motes)
    steps = 0
    if m &gt;= 1 and A == 1:
        return 10**12
    while A &lt;= m:
        A += (A-1)
        steps += 1
    return steps

def solve(A, motes):
    steps = 0
    A = howBigDoIget(A, motes)
    while len(motes) &gt; 0 and A &lt;= max(motes):
        if (stepsNeededForNext(A, motes) &gt;= len(motes)):
            steps += len(motes)
            return steps
        else:
            A += (A-1)
        A = howBigDoIget(A, motes)
        steps += 1
    return steps
 
if __name__ == &quot;__main__&quot;:
    testcases = input()
    for caseNr in xrange(1, testcases+1):
        A, N = map(int,raw_input().split(&quot; &quot;))
        motes = sorted(map(int,raw_input().split(&quot; &quot;)))
        copyed = motes[:]
        solution = solve(A, motes)
        if solution &gt; N:
            solution = N
        print(&quot;Case #%i: %s&quot; % (caseNr, solution))
</pre>
<h2>Falling Diamonds</h2>
<p>Oncee you&#8217;ve read the task, you should understand some very basic ideas:</p>
<ul>
<li>First of all, diamonds only fall at \(x=0\)!</li>
<li>If your target coordinates are \((x,y)\), you have the same output as for \((-x,y)\), as everything is symmetric.</li>
<li>You have to get a basis for your diamonds pyramid. I&#8217;ve colored the basis in yellow in the images below.</li>
<li>When your target is above the ground, you can let the diamond slide down to calculate the size of the basis.</li>
</ul>
<div id="attachment_65431" class="wp-caption aligncenter" style="width: 522px"><a href="http://martin-thoma.com/wp-content/uploads/2013/05/falling-diamonds-base.jpg"><img src="http://martin-thoma.com/wp-content/uploads/2013/05/falling-diamonds-base.jpg" alt="A basis for diamonds" width="512" height="323" class="size-full wp-image-65431" /></a><p class="wp-caption-text">You have to get those yellow diamonds first, before you can get the orange one.</p></div>
<div id="attachment_65441" class="wp-caption aligncenter" style="width: 522px"><a href="http://martin-thoma.com/wp-content/uploads/2013/05/falling-diamonds-slide.jpg"><img src="http://martin-thoma.com/wp-content/uploads/2013/05/falling-diamonds-slide.jpg" alt="Let Diamonds slide down" width="512" height="254" class="size-full wp-image-65441" /></a><p class="wp-caption-text">Let Diamonds slide down</p></div>
<p>Note that you don&#8217;t have to calculate a probabilty for the yellow pyramids. You get those with probability of 1.</p>
<p>What I&#8217;ve forgot: You should also catch the case that you can fill up the next bigger pyramid. If this is possible, you can guarantee that you will reach your target \((x,y)\).</p>
<p>The rest is simple math. You have \(rest\) diamonds left after you&#8217;ve build the base (yellow). Then you need \(y+1\) diamonds slide to the right side. The probability that you have exactly \(k\) hits while making \(N\) tries with a probability of 50% is \(\binom{N}{k} \cdot (\frac{1}{2})^N\). You want at least \(k\) hits, so you want \(\sum_{i=k}^N \binom{N}{i} \cdot (\frac{1}{2})^N\).</p>
<pre class="brush: python; title: ; notranslate">
#!/usr/bin/env python
# -*- coding: utf-8 -*-

import gmpy

&quot;&quot;&quot; Calculate the binomial coefficient &quot;&quot;&quot;
def binomial(n, k):
    return gmpy.comb(n,k)

&quot;&quot;&quot; 
    @param N: Number of diamonds
    @param x,y: Target coordinate
    @return: possiblity, that a diamond will be at coordinate (x,y) 
&quot;&quot;&quot;
def solve(N, x, y):
    if x == 0:
        n = y+1
        if N &gt;= (n*n+n)/2:
            return 1.0
        else:
            return 0.0

    # From this point, x != 0 is True

    xTmp = x + y # let target slide down

    n = xTmp-1
    baseDiamands = (n**2+n)/2

    # are there enough diamonds left after you've build the basis?
    rest = N - baseDiamands
    if rest &lt;= 0:
        return 0.0

    # are there enough diamonds left so that you can guarantee that 
    # you will  fill up the next bigger pyramid at least to the 
    # target position?
    biggerBaseDiamonds = baseDiamands+n+2+y
    if N &gt;= biggerBaseDiamonds:
        return 1.0

    # some math:
    # bernoulli
    prob = 0.0
    hitsNeeded = y+1

    for k in range(hitsNeeded, rest+1):
        prob += binomial(rest,k)
    
    return prob/2**rest
 
if __name__ == &quot;__main__&quot;:
    testcases = input()
      
    for caseNr in xrange(1, testcases+1):
        N, x, y = map(int,raw_input().split(&quot; &quot;))
        print(&quot;Case #%i: %.9Lf&quot; % (caseNr, solve(N, abs(x), y)))
</pre>
<p>The post <a href="http://martin-thoma.com/google-code-jam-round-1b-2013/">Google Code Jam – Round 1B 2013</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://martin-thoma.com/google-code-jam-round-1b-2013/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Sicherheit-Klausur</title>
		<link>http://martin-thoma.com/sicherheit-klausur/</link>
		<comments>http://martin-thoma.com/sicherheit-klausur/#comments</comments>
		<pubDate>Mon, 29 Apr 2013 08:19:05 +0000</pubDate>
		<dc:creator>Martin Thoma</dc:creator>
				<category><![CDATA[German posts]]></category>
		<category><![CDATA[Klausur]]></category>

		<guid isPermaLink="false">http://martin-thoma.com/?p=65051</guid>
		<description><![CDATA[<p>Dieser Artikel beschäftigt sich mit der Vorlesung „Sicherheit“ am KIT. Er dient als Prüfungsvorbereitung. Ich habe die Vorlesungen bei Herrn Prof. Hofheinz im Sommersemester 2013 gehört. An diesem Artikel wird natürlich noch gearbeitet. Behandelter Stoff 25.04.2013 Caesar, Vigenere, One-Time-Pad VL 01 22.04.2013 Symmetrische Verschlüsselungen; Stromchiffren; Blockchiffren (DES, AES); Betriebsmodi VL 02 25.04.2013 Lineare Kryptoanalyse, Differentielle [...]</p><p>The post <a href="http://martin-thoma.com/sicherheit-klausur/">Sicherheit-Klausur</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></description>
				<content:encoded><![CDATA[<div class="info">Dieser Artikel beschäftigt sich mit der Vorlesung „Sicherheit“ am KIT. Er dient als Prüfungsvorbereitung. Ich habe die Vorlesungen bei Herrn Prof. Hofheinz im Sommersemester 2013 gehört.</div>
<p>An diesem Artikel wird natürlich noch gearbeitet.</p>
<h2>Behandelter Stoff</h2>
<table>
<tr>
<td>25.04.2013</td>
<td rowspan="2" style="border-bottom:1px solid black;">Caesar, Vigenere, One-Time-Pad</td>
</tr>
<tr>
<td style="border-bottom:1px solid black;"><a href="http://www.iks.kit.edu/fileadmin/User/Lectures/Sicherheit/SoSe13/Sicherheit_VL01.pdf">VL 01</a></td>
</tr>
<tr>
<td>22.04.2013</td>
<td rowspan="2" style="border-bottom:1px solid black;">Symmetrische Verschlüsselungen; Stromchiffren; Blockchiffren (DES, AES); <span class="hint" title="ECB, CBC, CFB, OFB">Betriebsmodi</span></td>
</tr>
<tr>
<td style="border-bottom:1px solid black;"><a href="http://www.iks.kit.edu/fileadmin/User/Lectures/Sicherheit/SoSe13/Sicherheit_VL02.pdf">VL 02</a></td>
</tr>
<tr>
<td>25.04.2013</td>
<td rowspan="2" style="border-bottom:1px solid black;">Lineare Kryptoanalyse, Differentielle Kryptoanalyse, <a href="http://martin-thoma.com/semantische-sicherheit/" title="Semantische Sicherheit">Semantische Sicherheit</a>, IND-CPA, Feistel-Schema</td>
</tr>
<tr>
<td style="border-bottom:1px solid black;"><a href="http://www.iks.kit.edu/fileadmin/User/Lectures/Sicherheit/SoSe13/Sicherheit_VL03.pdf">VL 03</a></td>
</tr>
<tr>
<td>29.04.2013</td>
<td rowspan="2" style="border-bottom:1px solid black;"><span class="hint" title="128 Bit, 160 Bit und 256 Bit hashes sind üblich">Hashfunktionen</span>, Kollisionsresistenz \(\Rightarrow\) Einwegeigenschaft, <span class="hint" title="Beeinflusste MD5, SHA-1, SHA-2. SHA-3 benutzt MD nicht.">Merkle-Damgard-Konstruktion</span>, Birthday Attack, Meet-in-the-Middle-Angriff</td>
</tr>
<tr>
<td style="border-bottom:1px solid black;"><a href="http://www.iks.kit.edu/fileadmin/User/Lectures/Sicherheit/SoSe13/Sicherheit_VL04.pdf">VL 04</a></td>
</tr>
<tr>
<td>06.05.2013</td>
<td rowspan="2" style="border-bottom:1px solid black;">Public-Key-Verschlüsselung (Idee, RSA, ElGamal); Meet-in-the-Middle-Angriff für Hashfunktionen</td>
</tr>
<tr>
<td style="border-bottom:1px solid black;"><a href="http://www.iks.kit.edu/fileadmin/User/Lectures/Sicherheit/SoSe13/Sicherheit_VL05.pdf">VL 05</a></td>
</tr>
</table>
<p>Falls hier was fehlt, könnt ihr mich gerne in den Kommentaren oder per Mail (info@martin-thoma.de) darauf aufmerksam machen. Ich bin ja mal gespannt, ob ich das bis zum Ende aktuell halte.</p>
<h2>Definitionen</h2>
<div class="definition">
Eine über \(k\) parametrisierte Funktion \(H\) ist <strong>kollisionsresistent</strong>, wenn jeder PPT-Algorithmus nur mit höchstens vernachlässigbarer Wahrscheinlichkeit eine Kollision findet.</p>
<p>Für jeden PPT-Algorithmus \(\mathcal{A}\) ist</p>
\(Adv^{cr}_{H,\mathcal{A}}(k) := Pr[(X,X') \leftarrow \mathcal{A}(1^k): X \neq X' \land H_k(X) = H_k(X')]\)
<p>vernachlässigbar.
</p></div>
<div class="definition">
Eine über \(k\) parametrisierte Funktion \(H\) ist eine <strong>Einwegfunktion</strong> bzgl. der Urbildverteilung \(\mathcal{X}_k\), wenn jeder PPT-Algorithmus nur mit höchstens vernachlässigbarer Wahrscheinlichkeit ein Urbild eines gegebenen, aus \(\mathcal{X}_k\) gezogenen Bildes findet.</p>
<p>Für jeden PPT-Algorithmus \(\mathcal{A}\) ist </p>
\(Adv^{cr}_{H,\mathcal{A}}(k) := Pr[X' \leftarrow \mathcal{A}(1^k, H(X)): H(X) = H(X')]\)
<p>vernachlässigbar, wobei \(X \leftarrow \mathcal{X}_k\) gewählt wurde.
</p></div>
<h2>Fragen</h2>
<div class="question">
<span class="question">Wann ist ein Verschlüsselungsschema IND-CPA-sicher?</span></p>
<div class="answer">
IND-CPA bedeutet „indistinguishability under chosen-plaintext attacks“. Ein Verschlüsselungsschema ist genau dann IND-CPA-Sicher, wenn kein effizienter Angreifer \(\mathcal{A}\) Chiffrate von selbstgewählten Klartexten unterscheiden kann.
</div>
</div>
<h2>Material</h2>
<ul>
<li><a href="http://www.iks.kit.edu/index.php?id=sic-sose13">Vorlesungswebsite</a></li>
<li><a href="http://www.iks.kit.edu/fileadmin/User/Lectures/Sicherheit/SoSe13/Sicherheit_vorlaeufiges_Skript.pdf">Skript</a></li>
</ul>
<h2>Aufbau der Klausur</h2>
<p>?</p>
<h2>Übungsbetrieb</h2>
<p>Es gibt Übungsblätter auf der Vorlesungswebsite, aber keinen Übungsschein, keine Abgaben und keine Bonuspunkte.</p>
<h2>Termine und Klausurablauf</h2>
<p><strong>Datum</strong>: Freitag, den 26. Juli 2013 von 14:00 bis 16:00 Uhr<br />
<strong>Ort</strong>: steht noch nicht fest (Stand: 29.04.2013)<br />
<strong>Punkte</strong>: ?<br />
<strong>Bestehensgrenze</strong>: ?<br />
<strong>Übungsschein</strong>: Nein<br />
<strong>Bonuspunkte</strong>: Nein</p>
<h2>Nicht vergessen</h2>
<ul>
<li>Studentenausweis</li>
<li>Kugelschreiber</li>
</ul>
<h2>Ergebnisse</h2>
<p>Sind noch nicht draußen (Stand: 29.04.2013)</p>
<p>The post <a href="http://martin-thoma.com/sicherheit-klausur/">Sicherheit-Klausur</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://martin-thoma.com/sicherheit-klausur/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Semantische Sicherheit</title>
		<link>http://martin-thoma.com/semantische-sicherheit/</link>
		<comments>http://martin-thoma.com/semantische-sicherheit/#comments</comments>
		<pubDate>Sun, 28 Apr 2013 13:06:53 +0000</pubDate>
		<dc:creator>Martin Thoma</dc:creator>
				<category><![CDATA[German posts]]></category>
		<category><![CDATA[IT-Security]]></category>
		<category><![CDATA[Theoretical computer science]]></category>

		<guid isPermaLink="false">http://martin-thoma.com/?p=64801</guid>
		<description><![CDATA[<p>In der Vorlesung vom 25.04.2013 hat Prof. Hofheinz gesagt, dass man semantische Sicherheit praktisch nicht beweisen kann, da man zuerst beweisen müsste. Warum das so ist, versuche ich nun zu erläutern. Einwegfunktionen und Sei eine Funktion. heißt eine Einwegfunktion, genau dann wenn für alle gilt: kann in Polynomialzeit berechnet werden Für die Berechnung eines Urbildes [...]</p><p>The post <a href="http://martin-thoma.com/semantische-sicherheit/">Semantische Sicherheit</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></description>
				<content:encoded><![CDATA[<p>In der <a href="http://www.iks.kit.edu/fileadmin/User/Lectures/Sicherheit/SoSe13/Sicherheit_VL03.pdf">Vorlesung vom 25.04.2013</a> hat Prof. Hofheinz gesagt, dass man semantische Sicherheit praktisch nicht beweisen kann, da man zuerst \(\mathcal{P} \neq \mathcal{NP}\) beweisen müsste. Warum das so ist, versuche ich nun zu erläutern.</p>
<h2>Einwegfunktionen und \(\mathcal{P} \neq \mathcal{NP}\)</h2>
<div class="definition">
Sei \(f:X \rightarrow Y\) eine Funktion.<br />
\(f\) heißt eine Einwegfunktion, genau dann wenn für alle \(x \in X\) gilt:</p>
<ul>
<li>\(y := f(x)\) kann in Polynomialzeit berechnet werden</li>
<li>Für die Berechnung eines Urbildes \(x\) aus \(y\) existiert kein randomisierter Algorithmus, der in Polynomialzeit läuft.</li>
</ul>
</div>
<p>Es gilt: Wenn eine Einwegfunktion \(f\) existiert, dann gilt \(\mathcal{P} \neq \mathcal{NP}\).</p>
<p>Warum?</p>
<p>Nun, angenommen es gibt eine Einwegfunktion \(f\). Dann sei die formale Sprache \(L_f\) definiert durch:</p>
\(L_f := \{(\bar x, y) | \exists x: \bar x \text{ ist Präfix von } x \text{ und } y = f(x)\}\)
<p>Es gilt: \(L_f \notin \mathcal{P}\), da für ein gegebenes \(y\) das zugehörige \(x\) in polynomialzeit bestimmt werden könnte (wie will man sonst prüfen, ob \(\bar x\) ein Präfix von \(x\) ist?)</p>
<p>Falls jemanden diese Begründung nicht ausreicht ist hier noch ein Beweis von Prof. Hofheinz (Danke!)</p>
<p><strong>Beh.:</strong> \(L_f \notin \mathcal{P}\)<br />
<strong>Bew.:</strong> durch Widerspruch<br />
<u>Annahme.:</u> \(L_f \in P\)<br />
\(\Rightarrow\) Es existiert ein Polyzeit-Algorithmus \(\mathcal{A}\) für \(L_f\), der bei Eingabe \((\bar x, y)\) entscheidet, ob ein \(x\) mit \(f(x)=y\) und Präfix \(\bar x\) existiert oder nicht. Dann können wir einen Algorithmus \(\mathcal{B}\) aus \(\mathcal{A}\) bauen, der \(f\) invertiert.</p>
<p>Gegeben \(y\) verfährt \(\mathcal{B}\) wie folgt:<br />
\(\mathcal{B}\) ruft \(\mathcal{A}(0,y)\) auf und erfährt so, ob ein Urbild \(x\) von \(y\) mit Anfangsbit \(0\) existiert. Wenn ja, ruft \(\mathcal{B}\) den Algorithmus \(\mathcal{A}(00,y)\) auf, wenn nein ruft \(\mathcal{B}\) den Algorithmus \(\mathcal{A}(10,y)\) auf usw.</p>
<p>So wird ein Urbild \(x\) bitweise bestimmt. Ein solches \(\mathcal{B}\) findet also effizient Urbilder, im Widerspruch zur Einwegannahme über \(f \blacksquare\)</p>
<p>Aber: Wenn das \(x\) gegeben ist, dann ist es einfach zu zeigen, dass \(y= f(x)\) gilt und damit auch, ob \(\bar x\) ein Präfix von \(x\) ist. Damit ist \(L_f \in \mathcal{NP}\).</p>
<p>Damit gilt: \(L_f \in \mathcal{NP} \setminus \mathcal{P}\).<br />
Wenn aber \(\mathcal{NP} \setminus \mathcal{P} \neq \emptyset\), dann gilt insbesondere \(\mathcal{P} \neq \mathcal{NP}\).</p>
<p>An dieser Stelle sollte man also einsehen, dass eine Einwegfunktion nach obiger Definition nur existieren kann, wenn \(\mathcal{P} \neq \mathcal{NP}\) gilt.</p>
<h2>Semantische Sicherheit</h2>
<p>Wikipedia gibt folgende Beschreibung von semantischer Sicherheit:</p>
<blockquote><p>Ein Verschlüsselungsverfahren ist semantisch sicher, wenn jeder Angreifer jede Information, die er aus einem Chiffrat über die Nachricht ableiten kann, bereits dann ableiten kann, wenn er nur die Länge des Chiffrats kennt. Ein Chiffrat verrät also nichts über eine Nachricht als ihre Länge.</p></blockquote>
<p>Herr Prof. Hofheinz hat folgende informelle Definition von Semantischer Sicherheit in der Vorlesung gegeben:</p>
<div class="definition">
Ein symmetrisches Verschlüsselungsverfahren ist semantisch sicher, wenn es für jede \(M\)-Verteilung, jede Funktion \(f\) und jeden effizienten Algorithmus \(\mathcal{A}\) einen effizenten Algorithmus \(\mathcal{b}\) gibt, so dass</p>
\(Pr \left [\mathcal{A}^{\text{Enc}(K, \cdot)}(\text{Enc}(K, M)) = f(M) \right ] &#8211; Pr [\mathcal{B}(\varepsilon) = f(M)]\)
<p>klein ist.
</p></div>
<p>Hier ist </p>
<ul>
<li>\(M\) eine Nachricht (Message),</li>
<li>\(\text{Enc(K, M)}\)  die Verschlüsselung einer konkreten Nachricht \(M\) mit dem Schlüssel \(K\),</li>
<li>\(\varepsilon\) eine triviale Information (ich glaube das ist z.B. die Länge des Ciphertextes) und</li>
<li>\(f\) extrahiert beliebige Informationen aus dem Plaintext</li>
</ul>
<p>Die erste Wahrscheinlichkeit bezeichnet die Möglichkeit, aus dem Ciphertext Informationen der Art \(f\) über den Plaintext \(M\) zu erhalten.<br />
Die zweite Wahrscheinlichkeit bezeichnet die Möglichkeit „aus dem Nichts“ Informationen über eine Nachricht zu erhalten. Damit will man triviale Informationen eliminieren. Insgesamt gibt es also die Wahrscheinlichkeit an, nicht-triviale Informationen aus einer Verschlüsselten Nachricht zu erhalten. Mit effizient ist vermutlich in Polynomialzeit gemeint.</p>
<p>Wenn es nun mehrfach benutzbare semantisch sichere Verfahren gibt, dann kann man dieses Verfahren als Einwegfunktion nutzen. Wenn eine Einwegfunktion existiert, gilt \(\mathcal{P} \neq \mathcal{NP}\). Also folgt:</p>
<p>Wenn es nun mehrfach benutzbare semantisch sichere Verfahren existieren, gilt \(\mathcal{P} \neq \mathcal{NP}\). Dies ist aber eines der <a href="http://de.wikipedia.org/wiki/Millennium-Probleme">Millennium-Probleme</a> und noch nicht geklärt.</p>
<p>The post <a href="http://martin-thoma.com/semantische-sicherheit/">Semantische Sicherheit</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://martin-thoma.com/semantische-sicherheit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Code Jam – Round 1A 2013</title>
		<link>http://martin-thoma.com/google-code-jam-round-1a-2013/</link>
		<comments>http://martin-thoma.com/google-code-jam-round-1a-2013/#comments</comments>
		<pubDate>Sat, 27 Apr 2013 05:15:17 +0000</pubDate>
		<dc:creator>Martin Thoma</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[Google Code Jam]]></category>
		<category><![CDATA[itertools]]></category>
		<category><![CDATA[NumPy]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://martin-thoma.com/?p=64521</guid>
		<description><![CDATA[<p>Problem A (Bullseye): Small Set: 5856/6195 users (95%) Large Set: 1806/4795 users (38%) Problem B (Manage your Energy): Small Set: 2323/3789 users (61%) Large Set: 456/1133 users (40%) Problem C (Good Luck): Small Set: 1366/1774 users (77%) Large Set: 31/605 users (5%) More information might soon be on go-hero.net. I&#8217;m too slow for Google Code [...]</p><p>The post <a href="http://martin-thoma.com/google-code-jam-round-1a-2013/">Google Code Jam – Round 1A 2013</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></description>
				<content:encoded><![CDATA[<ul>
<li>Problem A (<a href="https://code.google.com/codejam/contest/2418487/dashboard#s=p0">Bullseye</a>):
<ul>
<li>Small Set: 5856/6195 users (95%)</li>
<li>Large Set: 1806/4795 users (38%)</li>
</ul>
<li>Problem B (<a href="https://code.google.com/codejam/contest/2418487/dashboard#s=p1">Manage your Energy</a>):
<ul>
<li>Small Set: 2323/3789 users (61%)</li>
<li>Large Set: 456/1133 users (40%)</li>
</ul>
</li>
<li>Problem C (<a href="https://code.google.com/codejam/contest/2418487/dashboard#s=p2">Good Luck</a>):
<ul>
<li>Small Set: 1366/1774 users (77%)</li>
<li>Large Set: 31/605 users (5%)</li>
</ul>
</li>
</ul>
<p>More information might soon be on <a href="http://www.go-hero.net/jam/13/">go-hero.net</a>.</p>
<p>I&#8217;m too slow for Google Code Jam *sigh*. Nevertheless, here are my solutions:</p>
<h2>Bullseye</h2>
<h3>Small</h3>
<pre class="brush: php; title: ; notranslate">
&lt;?

function solve($r, $t) {
    $circles = 0;
    while($t &gt;= 0) {
        $circles++;
        $t -= ($r+1)*($r+1)-$r*$r;
        $r += 2;
    }
    return floor($circles) - 1;
}

$fp = fopen ($argv[1], 'r');
$testcases = fgets ($fp);
$caseNr=0;
while($line = fgets ($fp)) {
    $caseNr++;
    $a = explode(' ', $line);
    $r = $a[0];
    $t = $a[1];
    echo &quot;Case #$caseNr: &quot;.solve($r, $t).&quot;\n&quot;;

}
?&gt;
</pre>
<h3>Large</h3>
<p>*Argh* I&#8217;ve copied the wrong equation from my pad to my computer </p>
<p>You basically have to solve this:<br />
\(\begin{align}<br />
t &#8211; \sum_{i=0}^x ((r+1+2i)^2 &#8211; (r+2i)^2) &#038;\geq 0\\<br />
\Leftrightarrow t &#8211; (x+1)(2x+2r+1) &#038;\geq 0 \\<br />
\Leftrightarrow (-2)x^2 + (2r+3)x + (t-2r-1) &#038;\geq 0 \\<br />
\Rightarrow x_{1,2} = 0 \Leftrightarrow x_{1,2} &#038;= \frac{1}{-4} \cdot (-(2r+3) \pm \sqrt{(2r+3)^2-4(-2)(t-2r-1)}) \\<br />
&#038;= -\frac{1}{4} \cdot (-2r-3 \pm \sqrt{4r^2+12r+9+8(t-2r-1)})\\<br />
&#038;= -\frac{1}{4} \cdot (-2r-3 \pm \sqrt{4r^2+12r+9+8t-16r-8})\\<br />
&#038;= -\frac{1}{4} \cdot (-2r-3 \pm \sqrt{4r^2-4r+1+8t})\\<br />
&#038;= \frac{1}{4} \cdot (2r+3 \pm \sqrt{(2r-1)^2+8t})\\<br />
&#038;= \frac{1}{4} \cdot (2r+3 \pm \sqrt{(2r-1)^2+8t})\\<br />
\end{align}<br />
\)</p>
<p>I have to know that \(1 \leq r\) and \(1 \geq x \in \mathbb{N}\). So you have to round \(x_1, x_2\) to the nearest solution.</p>
<p>Did you know that Python has (in numpy) a method to calculate roots of a quadratic equation? See <a href="http://docs.scipy.org/doc/numpy/reference/generated/numpy.roots.html">numpy.roots</a> for reference.</p>
<pre class="brush: python; title: ; notranslate">
#!/usr/bin/env python
# -*- coding: utf-8 -*-
 
from numpy import ceil, roots
 
def check(r, t, x):
    return t-(x+1)*(2*r+2*x+1) &gt;= 0
 
def solveFast(r, t):
    myRoots = roots((2,2*r+3,2*r+1-t))
    for i in xrange(2):
        if myRoots[i] &gt;= 0:
            answer = int(ceil(myRoots[i])) + 1
            while not check(r, t, answer):
                answer -= 1
            return answer + 1
  
if __name__ == &quot;__main__&quot;:
    testcases = input()
       
    for caseNr in xrange(1, testcases+1):
        line = raw_input()
        r, t = map(int, line.split(' '))
        print(&quot;Case #%i: %s&quot; % (caseNr, solveFast(r, t)))
</pre>
<h2>Good Luck</h2>
<p>This one solves at least the first test case, but not the second one.</p>
<p>I love <a href="http://docs.python.org/2/library/itertools.html">itertools</a> <img src='http://martin-thoma.com/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<pre class="brush: python; title: ; notranslate">
#!/usr/bin/env python
# -*- coding: utf-8 -*-

from math import factorial
from itertools import combinations_with_replacement
import pprint
from copy import deepcopy

def mul(integers):
    s = 1
    for p in integers:
        s *= p
    return s

def merge(candidates, block):
    newCandidates = deepcopy(candidates)
    for el in set(block):
        diff = block.count(el) - newCandidates.count(el)
        for i in xrange(diff):
            newCandidates.append(el)
    return newCandidates

def canBeIn(b, candidates, N):
    merged = merge(candidates, b)
    return not (len(merged) &gt; N)

&quot;&quot;&quot; 
    N: number of numbers in total that got randomly picked
    M: A_i in [2, M]
&quot;&quot;&quot;
def solve(N, M, products, productToBuildungs):
    candidates = []

    # Is there a simple answer?
    for p in products:
        if productToBuildungs[p][0] == 1:
            candidates = merge(candidates, productToBuildungs[p][1][0])
            if len(candidates) == N:
                return candidates

    for p in products:
        pos = filter(lambda b: canBeIn(b, candidates, N), productToBuildungs[p][1])
        if len(pos) == 1:
            candidates = merge(candidates, pos[0])

    while len(candidates) &lt; N:
        candidates.append(2)

    return candidates

if __name__ == &quot;__main__&quot;:
    testcases = input()
      
    for caseNr in xrange(1, testcases+1):
        print(&quot;Case #%i:&quot; % caseNr)
        line = raw_input()
        arr = line.split(' ')
        R = int(arr[0])
        N = int(arr[1])
        M = int(arr[2])
        K = int(arr[3])

        # which products can I get
        productToBuildungs = {}
        for r in xrange(0, N+1):
            for product in combinations_with_replacement(range(2,M+1),r):
                s = mul(product)
                if s not in productToBuildungs:
                    productToBuildungs[s] = [1, [list(product)]]
                else:
                    productToBuildungs[s][0] += 1
                    productToBuildungs[s][1].append(list(product))

        for r in xrange(R):
            products = [int(el) for el in raw_input().split(' ') if int(el) != 1]
            print(''.join(map(str, sorted(solve(N, M, products, productToBuildungs)))))
</pre>
<p>The post <a href="http://martin-thoma.com/google-code-jam-round-1a-2013/">Google Code Jam – Round 1A 2013</a> appeared first on <a href="http://martin-thoma.com">Martin Thoma</a>.</p>]]></content:encoded>
			<wfw:commentRss>http://martin-thoma.com/google-code-jam-round-1a-2013/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
