<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>I say things</title>
	<atom:link href="http://fseoane.net/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://fseoane.net/blog</link>
	<description>mostly about programming, machine learning and such</description>
	<lastBuildDate>Mon, 14 May 2012 22:57:23 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.2</generator>
		<item>
		<title>line-by-line memory usage of a Python program</title>
		<link>http://fseoane.net/blog/2012/line-by-line-report-of-memory-usage/</link>
		<comments>http://fseoane.net/blog/2012/line-by-line-report-of-memory-usage/#comments</comments>
		<pubDate>Tue, 24 Apr 2012 05:04:46 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=1164</guid>
		<description><![CDATA[My newest project is a Python library for monitoring memory consumption of arbitrary process, and one of its most useful features is the line-by-line analysis of memory usage for Python code. I wrote a basic prototype six months ago after being surprised by the lack of related tools. I wanted to plot memory consumption of [...]]]></description>
			<content:encoded><![CDATA[<p>My newest project is a Python library for monitoring memory consumption of arbitrary process, and one of its most useful features is the line-by-line analysis of memory usage for Python code.</p>
<p>I wrote a basic prototype six months ago after being surprised by the lack of related tools. I wanted to  <a href="http://fseoane.net/blog/2011/qr_multiply-function-in-scipy-linalg/">plot memory consumption</a> of a couple of Python functions but did not find a python module to do the job. I came to the conclusion that there is no standard way to get the memory usage of the Python interpreter from within Python, so I resorted to reading for from <code>/proc/$PID/statm</code>. From there on I realized that one the fetching of memory is done, making a line-by-line report wouldn&#8217;t be hard. </p>
<p>Back to today. I&#8217;ve been using the line-by-line memory monitoring to diagnose poor memory management (hidden temporaries, unused allocation, etc.) for some time. It seems to work on two different computers, so full of confidence as I am, I&#8217;ll write a blog post about it &#8230;</p>
<h2>How to use it?</h2>
<p>The easiest way to get it is to install from the Python Package Index:</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&nbsp; &nbsp; $ easy_install <span style="color: #660033;">-U</span> memory_profiler <span style="color: #666666; font-style: italic;"># pip install -U memory_profiler</span></div></div>
<p>but other options include fetching the latests from <a href="https://github.com/fabianp/memory_profiler">github</a> or dropping it on your current working directory or somewhere else on your PYTHONPATH since it consist of a single file.</p>
<p>Then next step is to write some python code to profile. It can be just about any function, but for the purpose of this blog post I&#8217;ll create a function my_func() with mostly memory allocations and save it to a file named example.py:</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #ff7700;font-weight:bold;">import</span> numpy <span style="color: #ff7700;font-weight:bold;">as</span> np<br />
<br />
@<span style="color: #dc143c;">profile</span><br />
<span style="color: #ff7700;font-weight:bold;">def</span> my_func<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; a = np.<span style="color: black;">zeros</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">100</span>, <span style="color: #ff4500;">100</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; b = np.<span style="color: black;">zeros</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">1000</span>, <span style="color: #ff4500;">1000</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; c = np.<span style="color: black;">zeros</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">10000</span>, <span style="color: #ff4500;">1000</span><span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> a, b, c<br />
<br />
<span style="color: #ff7700;font-weight:bold;">if</span> __name__ == <span style="color: #483d8b;">'__main__'</span>:<br />
&nbsp; &nbsp; my_func<span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></div></div>
<p>Note that I&#8217;ve decorated the function with @profile. This tells the profiler to look into function my_func and gather the memory consumption for each line.</p>
<h2>Wake up the cookie monster</h2>
<p>To start profiling and output the result to stdout, run the script as usual and append the options &#8220;-m memory_profiler -l -v&#8221; to the python interpreter.</p>
<div class="codecolorer-container bash default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="bash codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">$ python <span style="color: #660033;">-m</span> memory_profiler <span style="color: #660033;">-l</span> <span style="color: #660033;">-v</span> example.py<br />
Line <span style="color: #666666; font-style: italic;"># &nbsp; &nbsp;Mem usage &nbsp; Line Contents</span><br />
===================================<br />
&nbsp; &nbsp; &nbsp;<span style="color: #000000;">3</span> &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #000000; font-weight: bold;">@</span>profile<br />
&nbsp; &nbsp; &nbsp;<span style="color: #000000;">4</span> &nbsp; &nbsp; <span style="color: #000000;">13.68</span> MB &nbsp; def my_func<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp;<span style="color: #000000;">5</span> &nbsp; &nbsp; <span style="color: #000000;">13.77</span> MB &nbsp; &nbsp; &nbsp; a = np.zeros<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #000000;">100</span>, <span style="color: #000000;">100</span><span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span><br />
&nbsp; &nbsp; &nbsp;<span style="color: #000000;">6</span> &nbsp; &nbsp; <span style="color: #000000;">21.40</span> MB &nbsp; &nbsp; &nbsp; b = np.zeros<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #000000;">1000</span>, <span style="color: #000000;">1000</span><span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span><br />
&nbsp; &nbsp; &nbsp;<span style="color: #000000;">7</span> &nbsp; &nbsp; <span style="color: #000000;">97.70</span> MB &nbsp; &nbsp; &nbsp; c = np.zeros<span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #7a0874; font-weight: bold;">&#40;</span><span style="color: #000000;">10000</span>, <span style="color: #000000;">1000</span><span style="color: #7a0874; font-weight: bold;">&#41;</span><span style="color: #7a0874; font-weight: bold;">&#41;</span><br />
&nbsp; &nbsp; &nbsp;<span style="color: #000000;">8</span> &nbsp; &nbsp; <span style="color: #000000;">97.70</span> MB &nbsp; &nbsp; &nbsp; <span style="color: #7a0874; font-weight: bold;">return</span> a, b, c</div></div>
<p>voilá! Each line is prefixed by the memory usage in MB of the Python interpreter after that line has been executed.</p>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2012/line-by-line-report-of-memory-usage/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Low rank approximation</title>
		<link>http://fseoane.net/blog/2011/low-rank-approximation/</link>
		<comments>http://fseoane.net/blog/2011/low-rank-approximation/#comments</comments>
		<pubDate>Sun, 06 Nov 2011 10:05:09 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=1128</guid>
		<description><![CDATA[A little experiment to see what low rank approximation looks like. These are the best rank-k approximations (in the Frobenius norm) to the a natural image for increasing values of k and an original image of rank 512. Python code can be found here. GIF animation made using ImageMagic&#8217;s convert script.]]></description>
			<content:encoded><![CDATA[<p>A little experiment to see what low rank approximation looks like. These are the best rank-k approximations (in the Frobenius norm) to the a natural image for increasing values of k and an original image of rank 512.</p>
<p><a href="http://fseoane.net/blog/wp-content/uploads/2011/11/animation1.gif"><img src="http://fseoane.net/blog/wp-content/uploads/2011/11/animation1.gif" alt="" title="Low-rank approximation for the Lena Image" width="600" height="400" class="aligncenter size-full wp-image-1156" /></a></p>
<p>Python code can be found <a href="https://gist.github.com/1342033">here</a>. GIF animation made using ImageMagic&#8217;s convert script.</p>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/low-rank-approximation/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>qr_multiply function in scipy.linalg</title>
		<link>http://fseoane.net/blog/2011/qr_multiply-function-in-scipy-linalg/</link>
		<comments>http://fseoane.net/blog/2011/qr_multiply-function-in-scipy-linalg/#comments</comments>
		<pubDate>Fri, 14 Oct 2011 14:44:10 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[scipy]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=1062</guid>
		<description><![CDATA[In scipy&#8217;s development version there&#8217;s a new function closely related to the QR-decomposition of a matrix and to the least-squares solution of a linear system. What this function does is to compute the QR-decomposition of a matrix and then multiply the resulting orthogonal factor by another arbitrary matrix. In pseudocode: def qr_multiply&#40;X, Y&#41;: &#160; &#160; [...]]]></description>
			<content:encoded><![CDATA[<p>In scipy&#8217;s development version there&#8217;s a new function closely related to the <a href="http://en.wikipedia.org/wiki/QR_decomposition">QR-decomposition</a> of a matrix and to the least-squares solution of a linear system.</p>
<p>What this function does is to compute the QR-decomposition of a matrix and then multiply the resulting orthogonal factor by another arbitrary matrix. In pseudocode:</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #ff7700;font-weight:bold;">def</span> qr_multiply<span style="color: black;">&#40;</span>X, Y<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; Q, R = qr<span style="color: black;">&#40;</span>X<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> dot<span style="color: black;">&#40;</span>Q.<span style="color: black;">T</span>, Y<span style="color: black;">&#41;</span></div></div>
<p>but unlike this naive implementation, <code>qr_multiply</code> is able to do all this <b>without</b> explicitly computing the orthogonal Q matrix, resulting both in memory and time saving. In the following picture I measured the memory consumption as a function of time of running this computation on a 1.000 x 1.000 matrix X and a vector Y (full code can be found <a href="https://gist.github.com/1287168">here</a>):</p>
<p><a href="http://fseoane.net/blog/wp-content/uploads/2011/10/qr_multiply1.png"><img src="http://fseoane.net/blog/wp-content/uploads/2011/10/qr_multiply1-300x225.png" alt="" title="Memory usage for a QR multiplication" width="300" height="225" class="aligncenter size-medium wp-image-1076" /></a></p>
<p>It can be seen that not only <code>qr_multiply</code> is almost twice as fast as the naive approach, but also that the memory consumption is significantly reduced, since the orthogonal factor is never explicitly computed.</p>
<p>Credit for implementing the qr_multiply function goes to <a href="https://github.com/tecki">Martin Teichmann</a>. </p>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/qr_multiply-function-in-scipy-linalg/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>scikit-learn 0.9</title>
		<link>http://fseoane.net/blog/2011/scikit-learn-0-9/</link>
		<comments>http://fseoane.net/blog/2011/scikit-learn-0-9/#comments</comments>
		<pubDate>Sun, 02 Oct 2011 09:19:57 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[scikit-learn]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=1048</guid>
		<description><![CDATA[Last week we released a new version of scikit-learn. The Changelog is particularly impressive, yet personally this release is important for other reasons. This will probably be my last release as a paid engineer. I&#8217;m starting a PhD next month, and although I plan to continue contributing to the project and make a few more [...]]]></description>
			<content:encoded><![CDATA[<p>Last week we released a new version of scikit-learn. The <a href="http://scikit-learn.sourceforge.net/stable/whats_new.html">Changelog is particularly impressive</a>, yet personally this release is important for other reasons. </p>
<p>This will probably be my last release as a paid engineer. I&#8217;m starting a PhD next month, and although I plan to continue contributing to the project and make a few more releases, I will certainly have less time to devote to it. Luckily, I received a lot of help from the community while preparing the release, from Changelog writing to build of Windows binaries, thus I expect the transition to go smoothly.</p>
<p>Almost two years have elapsed since the first 0.1 release. During this time, we did a lot of refactoring and broke the API several times. However, I&#8217;ve seen some concerns about API stability both at the EuroScipy conference and in the mailing list where I’ve realized we need to provide an API that does not break in every release, and do this in a way that the project remains fun for developers.</p>
<p> That&#8217;s why I&#8217;m extremely glad to see that although this release is big in changes, these have been made in a more organized manner. Yes, we&#8217;ve broken the API once again, but now there&#8217;s a compatibility layer that ensures that code written for 0.8 will continue working with the new release. </p>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/scikit-learn-0-9/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Reworked example gallery for scikit-learn</title>
		<link>http://fseoane.net/blog/2011/reworked-example-gallery-for-scikit-learn/</link>
		<comments>http://fseoane.net/blog/2011/reworked-example-gallery-for-scikit-learn/#comments</comments>
		<pubDate>Sun, 04 Sep 2011 18:09:02 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[scikit-learn]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=1017</guid>
		<description><![CDATA[I&#8217;ve been working lately in improving the scikit-learn example gallery to show also a small thumbnail of the plotted result. Here is what the gallery looks like now And the real thing should be already displayed in the development documentation. The next thing is to add a static image to those that don&#8217;t generate any [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working lately in improving the scikit-learn example gallery to show also a small thumbnail of the plotted result. Here is what the gallery looks like now</p>
<p><a href="http://scikit-learn.sourceforge.net/dev/auto_examples/index.html"><img src="http://fseoane.net/blog/wp-content/uploads/2011/09/Screenshot-Examples-—-scikit-learn-v0.9-git-documentation-Google-Chrome.png" alt="" title="Screenshot-Examples" width="600" /></a></p>
<p><br/></p>
<p>And the real thing should be already displayed in the <a href="http://scikit-learn.sourceforge.net/dev/auto_examples/index.html">development documentation</a>. The next thing is to add a static image to those that don&#8217;t generate any result, examples such as the <a href="http://scikit-learn.sourceforge.net/dev/auto_examples/applications/svm_gui.html">SVM GUI</a> should have an image to display.</p>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/reworked-example-gallery-for-scikit-learn/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>scikit-learn’s EuroScipy 2011 coding sprint &#8212; day two</title>
		<link>http://fseoane.net/blog/2011/scikit-learn%e2%80%99s-euroscipy-2011-coding-sprint-day-two/</link>
		<comments>http://fseoane.net/blog/2011/scikit-learn%e2%80%99s-euroscipy-2011-coding-sprint-day-two/#comments</comments>
		<pubDate>Wed, 24 Aug 2011 22:33:10 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[scikit-learn]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=999</guid>
		<description><![CDATA[Today&#8217;s coding sprint was a bit more crowded, with some notable scipy hackers such as Ralph Gommers, Stefan van der Walt, David Cournapeau or Fernando Perez from Ipython joining in. On what got done: &#8211; We merged Jake&#8216;s new BallTree code. This is a pure Cython implementation of a nearest-neighbor search similar to the KDTree [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://fseoane.net/blog/wp-content/uploads/2011/08/all.jpg"><img src="http://fseoane.net/blog/wp-content/uploads/2011/08/all-300x225.jpg" alt="" title="scikit-learn coding sprint" width="300" height="225" class="aligncenter size-medium wp-image-1001" /></a></p>
<p>Today&#8217;s coding sprint was a bit more crowded, with some notable scipy hackers such as Ralph Gommers, <a href="http://mentat.za.net/">Stefan van der Walt</a>, <a href="http://cournape.wordpress.com/">David Cournapeau</a> or <a href="http://blog.fperez.org/">Fernando Perez</a> from Ipython joining in. On what got done:</p>
<p>  &#8211; We merged <a href="http://www.astro.washington.edu/users/vanderplas/">Jake</a>&#8216;s new BallTree code. This is a pure Cython implementation of a nearest-neighbor search similar to the KDTree class in scipy.spatial, but much faster. The code looks awesome and it&#8217;s a big speedup compared to the older code.</p>
<p>  &#8211; Vlad is ready to merge his<a href="https://github.com/scikit-learn/scikit-learn/pull/221"> dictionary learning code</a>, something that should happen in the upcoming days.</p>
<p>  &#8211; Initial support for Python 3. scikit-learn should now at least build and import cleanly under Python 3.</p>
<p>  &#8211; some bugfixes in the Pipeline object and in docstrings.</p>
<p>So this was the end of the scikit-learn sprint, but EuroScipy has just begun. See you tomorrow at the conference (follow the signs)!</p>
<p><a href="http://fseoane.net/blog/wp-content/uploads/2011/08/IMG_0093.jpg"><img src="http://fseoane.net/blog/wp-content/uploads/2011/08/IMG_0093-202x300.jpg" alt="" title="IMG_0093" width="202" height="300" class="alignleft size-medium wp-image-1003" /></a></p>
<p><a href="http://fseoane.net/blog/wp-content/uploads/2011/08/IMG_0092.jpg"><img src="http://fseoane.net/blog/wp-content/uploads/2011/08/IMG_0092-189x300.jpg" alt="" title="yannick" width="189" height="300" class="alignleft size-medium wp-image-1004" /></a></p>
<div style="clear: both">
&#8211;
</div>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/scikit-learn%e2%80%99s-euroscipy-2011-coding-sprint-day-two/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>scikit-learn&#8217;s EuroScipy 2011 coding sprint  &#8212; day one</title>
		<link>http://fseoane.net/blog/2011/scikit-learns-euroscipy-2011-coding-sprint-day-one/</link>
		<comments>http://fseoane.net/blog/2011/scikit-learns-euroscipy-2011-coding-sprint-day-one/#comments</comments>
		<pubDate>Tue, 23 Aug 2011 19:38:09 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[scikit-learn]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=974</guid>
		<description><![CDATA[As a warm-up for the upcoming EuroScipy conference, some of the scikit-learn developers decided to gather and work together for a couple of days. Today was the first day and there was only a handfull of us, as the real kickoff is expected tomorrow. Some interesting coding happened, although most of us where still preparing [...]]]></description>
			<content:encoded><![CDATA[<p>As a warm-up for the upcoming <a href="http://www.euroscipy.org/conference/euroscipy2011">EuroScipy conference</a>, some of the <a href="http://scikit-learn.sf.net">scikit-learn</a> developers decided to gather and work together for a couple of days.</p>
<p>Today was the first day and there was only a handfull of us, as the real kickoff is expected tomorrow. Some interesting coding happened, although most of us where still preparing material for the EuroScipy tutorials &#8230;</p>
<p>    &#8211; API changes: remove of keyword parameters to <i>fit</i> method, added method <i>set_params</i>  (<a href="https://github.com/scikit-learn/scikit-learn/pull/306">pull request</a>).</p>
<p>    &#8211; Some bugfixing in NuSVR (<a href="https://github.com/scikit-learn/scikit-learn/pull/315">pull request</a>)</p>
<p>    &#8211; Review of <a href="http://vene.ro">Vlad</a>&#8216;s code, developed during his Summer of Code program.</p>
<p>    &#8211; A lot of discussion about algorithm, code, APIs and buildbot dance !</p>
<p><a href="http://fseoane.net/blog/wp-content/uploads/2011/08/IMG_0076.jpg"><img src="http://fseoane.net/blog/wp-content/uploads/2011/08/IMG_0076-150x150.jpg" alt="" title="varokoo" width="150" height="150" class="alignleft size-thumbnail wp-image-977" /></a></p>
<p><a href="http://fseoane.net/blog/wp-content/uploads/2011/08/Picture-3.png"><img src="http://fseoane.net/blog/wp-content/uploads/2011/08/Picture-3-150x150.png" alt="" title="Olivier Grisel" width="150" height="150" class="alignleft size-thumbnail wp-image-979" /></a></p>
<p><a href="http://fseoane.net/blog/wp-content/uploads/2011/08/IMG_0074.jpg"><img src="http://fseoane.net/blog/wp-content/uploads/2011/08/IMG_0074-150x150.jpg" alt="" title="Vlad" width="150" height="150" class="alignleft size-thumbnail wp-image-982" /></a></p>
<p><a href="http://fseoane.net/blog/wp-content/uploads/2011/08/Picture-5.png"><img src="http://fseoane.net/blog/wp-content/uploads/2011/08/Picture-5-150x150.png" alt="" title="Me and Jean" width="150" height="150" class="alignleft size-thumbnail wp-image-986" /></a></p>
<p><a href="http://fseoane.net/blog/wp-content/uploads/2011/08/emanuelle.jpg"><img src="http://fseoane.net/blog/wp-content/uploads/2011/08/emanuelle-150x150.jpg" alt="" title="emanuelle" width="150" height="150" class="alignleft size-thumbnail wp-image-990" /></a></p>
<div style="clear: both">
&#8211;
</div>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/scikit-learns-euroscipy-2011-coding-sprint-day-one/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Ridge regression path</title>
		<link>http://fseoane.net/blog/2011/ridge-regression-path/</link>
		<comments>http://fseoane.net/blog/2011/ridge-regression-path/#comments</comments>
		<pubDate>Tue, 12 Jul 2011 07:21:08 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[scikit-learn]]></category>
		<category><![CDATA[scipy]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=939</guid>
		<description><![CDATA[Ridge coefficients for multiple values of the regularization parameter can be elegantly computed by updating the thin SVD decomposition of the design matrix: import numpy as np from scipy import linalg def ridge&#40;A, b, alphas&#41;: &#160; &#160; &#34;&#34;&#34;Return coefficients for regularized least squares &#160; &#160; &#160; &#160; &#160; &#160; &#160; &#160;min &#124;&#124;A x - b&#124;&#124; [...]]]></description>
			<content:encoded><![CDATA[<p>Ridge coefficients for multiple values of the regularization parameter can be elegantly computed by updating the <i>thin</i> SVD decomposition of the design matrix:</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #ff7700;font-weight:bold;">import</span> numpy <span style="color: #ff7700;font-weight:bold;">as</span> np<br />
<span style="color: #ff7700;font-weight:bold;">from</span> scipy <span style="color: #ff7700;font-weight:bold;">import</span> linalg<br />
<br />
<span style="color: #ff7700;font-weight:bold;">def</span> ridge<span style="color: black;">&#40;</span>A, b, alphas<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #483d8b;">&quot;&quot;&quot;Return coefficients for regularized least squares <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;min ||A x - b|| + alpha ||x||^2<br />
&nbsp; &nbsp; &quot;&quot;&quot;</span> <br />
&nbsp; &nbsp; U, s, V = linalg.<span style="color: black;">svd</span><span style="color: black;">&#40;</span>A, full_matrices=<span style="color: #008000;">False</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; d = np.<span style="color: black;">dot</span><span style="color: black;">&#40;</span>U.<span style="color: black;">T</span>, b<span style="color: black;">&#41;</span> / <span style="color: black;">&#40;</span>s + alphas<span style="color: black;">&#91;</span>:, np.<span style="color: black;">newaxis</span><span style="color: black;">&#93;</span> / s<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> np.<span style="color: black;">dot</span><span style="color: black;">&#40;</span>d, V<span style="color: black;">&#41;</span></div></div>
<p>For a concrete problem it then can be used to efficiently compute it&#8217;s <i>path</i>, that is, to plot the coefficients as a function of the regularization parameter.</p>
<p><a href="http://fseoane.net/blog/wp-content/uploads/2011/07/ridge_nocv.png"><img src="http://fseoane.net/blog/wp-content/uploads/2011/07/ridge_nocv.png" alt="" title="Ridge path" width="550" class="aligncenter size-full wp-image-950" /></a></p>
<p>A variant of this algorithm can then be used to compute the optimal regularization parameter in the sense of leave-one-out cross-validation and is implemented in scikit-learn&#8217;s <a href="http://scikit-learn.sourceforge.net/dev/modules/linear_model.html#generalized-cross-validation">RidgeCV</a> (for which Mathieu Blondel has an <a href="http://www.mblondel.org/journal/2011/02/09/regularized-least-squares/">excelent post</a> by ). This optimal parameter is denoted with a vertical dotted line in the following picture, full code can be found <a href="https://gist.github.com/1076844">here</a>.</p>
<p><a href="http://fseoane.net/blog/wp-content/uploads/2011/07/ridge.png"><img src="http://fseoane.net/blog/wp-content/uploads/2011/07/ridge.png" alt="" title="Ridge path" width="550" class="aligncenter size-full wp-image-940" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/ridge-regression-path/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>LLE comes in different flavours</title>
		<link>http://fseoane.net/blog/2011/lle-comes-in-different-flavours/</link>
		<comments>http://fseoane.net/blog/2011/lle-comes-in-different-flavours/#comments</comments>
		<pubDate>Thu, 30 Jun 2011 14:22:04 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[manifold learning]]></category>
		<category><![CDATA[scikit-learn]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=882</guid>
		<description><![CDATA[I haven&#8217;t worked in the manifold module since last time, yet thanks to Jake VanderPlas there are some cool features I can talk about. First of, the ARPACK backend is finally working and gives factor one speedup over the lobcpg + PyAMG approach. The key is to use ARPACK&#8217;s shift-invert mode instead of the regular [...]]]></description>
			<content:encoded><![CDATA[<p>I haven&#8217;t worked in the manifold module since <a href="http://fseoane.net/blog/2011/manifold-learning-in-scikit-learn/">last time</a>, yet thanks to <a href="http://www.astro.washington.edu/users/vanderplas/">Jake VanderPlas</a> there are some cool features I can talk about.</p>
<p>First of, the ARPACK backend is finally working and gives factor one speedup over the <a href="http://fseoane.net/blog/2011/locally-linear-embedding-and-sparse-eigensolvers/">lobcpg + PyAMG approach</a>. The key is to use ARPACK&#8217;s shift-invert mode instead of the regular mode, a subtle change that drove me crazy for weeks and that Jake spotted by comparing it to his <a href="https://github.com/jakevdp/pyLLE">C++ LLE implementation</a>.</p>
<p>More importantly, some variants of Locally Linear Embedding (LLE) have been added to the module: <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.70.382">Modified LLE</a>, <a href="http://www-stat.stanford.edu/~donoho/Reports/2003/HessianEigenmaps.pdf">Hessian LLE</a> and <a href="http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.4.3693">LTSA</a>. These seem to generate better solutions than the classical LLE with timings that are not far apart. All the LLE variants currently implemented can be seen in <a href="http://scikit-learn.sourceforge.net/dev/auto_examples/manifold/plot_compare_methods.html">this example</a>, where they are applied to an S-shaped dataset.</p>
<p><a href="http://scikit-learn.sourceforge.net/dev/auto_examples/manifold/plot_compare_methods.html"><img src="http://fseoane.net/blog/wp-content/uploads/2011/06/manifold_methods.png" alt="" title="Variants of Locally Linear Embedding on the S-curve" width="600" class="aligncenter wp-image-885" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/lle-comes-in-different-flavours/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Manifold learning in scikit-learn</title>
		<link>http://fseoane.net/blog/2011/manifold-learning-in-scikit-learn/</link>
		<comments>http://fseoane.net/blog/2011/manifold-learning-in-scikit-learn/#comments</comments>
		<pubDate>Tue, 07 Jun 2011 07:19:49 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[scikit-learn]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=793</guid>
		<description><![CDATA[The manifold module in scikit-learn is slowly progressing: the locally linear embedding implementation was finally merged along with some documentation. At about the same time but in a different timezone, Jake VanderPlas began coding other manifold learning methods and back in Paris Olivier Grisel made my digits example a lot nicer by adding the embedding [...]]]></description>
			<content:encoded><![CDATA[<p>The manifold module in <a href="http://scikit-learn.sf.net">scikit-learn</a> is slowly progressing: the <a href="http://fseoane.net/blog/2011/locally-linear-embedding-and-sparse-eigensolvers/">locally linear embedding</a> implementation was finally merged along with <a href="http://scikit-learn.sourceforge.net/dev/modules/manifold.html">some documentation</a>. At about the same time but in a different timezone, <a href="http://www.astro.washington.edu/users/vanderplas/">Jake VanderPlas</a> began coding <a href="https://github.com/jakevdp/scikit-learn/compare/master...manifold">other manifold learning methods</a> and back in Paris <a href="http://twitter.com/ogrisel">Olivier Grisel</a> made <a href="http://fseoane.net/blog/2011/handwritten-digits-and-locally-linear-embedding/">my digits example</a> a  <a href="http://scikit-learn.sourceforge.net/dev/auto_examples/manifold/plot_lle_digits.html">lot nicer</a> by adding the embedding of different dimensionality reduction techniques from scikit-learn:</p>
<p><a href="http://scikit-learn.sourceforge.net/dev/auto_examples/manifold/plot_lle_digits.html"><img src="http://fseoane.net/blog/wp-content/uploads/2011/06/plot_lle_digits_4-300x225.png" alt="" title="plot_lle_digits_4" width="300" height="225" class="aligncenter size-medium wp-image-803" /></a><br />
<a href="http://scikit-learn.sourceforge.net/dev/auto_examples/manifold/plot_lle_digits.html"><img src="http://fseoane.net/blog/wp-content/uploads/2011/06/plot_lle_digits_3-300x225.png" alt="" title="plot_lle_digits_3" width="300" height="225" class="aligncenter size-medium wp-image-802" /></a><br />
<a href="http://scikit-learn.sourceforge.net/dev/auto_examples/manifold/plot_lle_digits.html"><img src="http://fseoane.net/blog/wp-content/uploads/2011/06/plot_lle_digits_2-300x225.png" alt="" title="plot_lle_digits_2" width="300" height="225" class="aligncenter size-medium wp-image-801" /></a><br />
 <a href="http://scikit-learn.sourceforge.net/dev/auto_examples/manifold/plot_lle_digits.html"><img src="http://fseoane.net/blog/wp-content/uploads/2011/06/plot_lle_digits_1-300x225.png" alt="" title="plot_lle_digits_1" width="300" height="225" class="aligncenter size-medium wp-image-800" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/manifold-learning-in-scikit-learn/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Handwritten digits and Locally Linear Embedding</title>
		<link>http://fseoane.net/blog/2011/handwritten-digits-and-locally-linear-embedding/</link>
		<comments>http://fseoane.net/blog/2011/handwritten-digits-and-locally-linear-embedding/#comments</comments>
		<pubDate>Wed, 04 May 2011 08:46:47 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[scikit-learn]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=755</guid>
		<description><![CDATA[I decided to test my new Locally Linear Embedding (LLE) implementation against a real dataset. At first I didn&#8217;t think this would turn out very well, since LLE seems to be somewhat fragile, yielding largely different results for small differences in parameters such as number of neighbors or tolerance, but as it turns out, results [...]]]></description>
			<content:encoded><![CDATA[<p>I decided to test my <a href="http://fseoane.net/blog/2011/locally-linear-embedding-and-sparse-eigensolvers/">new Locally Linear Embedding (LLE)</a> implementation against a real dataset. At first I didn&#8217;t think this would turn out very well, since LLE seems to be somewhat fragile, yielding largely different results for small differences in parameters such as number of neighbors or tolerance, but as it turns out, results are not bad at all.</p>
<p>The idea is to take a handwritten digit, stored as a 8&#215;8 pixel image and flatten it into a an array of 8&#215;8 = 64 floating-point values.</p>
<p><a href="http://fseoane.net/blog/wp-content/uploads/2011/05/digits_transformation1.png"><img src="http://fseoane.net/blog/wp-content/uploads/2011/05/digits_transformation1.png" alt="" title="digits_transformation" width="350" class="aligncenter size-full wp-image-786" /></a></p>
<p>Then each handwritten digit can be seen as a point in a 64-dimensional space. Of course, visualizing in 64-dimensional spaces is not easy, and that&#8217;s where <a href="http://fseoane.net/blog/2011/locally-linear-embedding-and-sparse-eigensolvers/">Locally Linear Embedding</a> comes handy. We&#8217;ll use this method to reduce the dimension from 64 to 2 with the hope of preserving most of the underlying manifold structure. The following is a plot of the handwritten digits {0, 1, 2, 3, 4} after performing locally linear embedding. As you can see, some groups are nicely clustered, notably the 0 is isolated while other like {4, 5} are closer, precisely those that are more similar.</p>
<p><a href="http://fseoane.net/blog/wp-content/uploads/2011/05/Picture-1.png"><img src="http://fseoane.net/blog/wp-content/uploads/2011/05/Picture-1.png" alt="" title="Digitst and Locally Linear Embedding" width="500" class="aligncenter size-full wp-image-757" /></a></p>
<p>Source code for this example <a href="https://gist.github.com/954815">can be found here</a> but relies on my manifold branch of scikit-learn. </p>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/handwritten-digits-and-locally-linear-embedding/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Low-level routines for Support Vector Machines</title>
		<link>http://fseoane.net/blog/2011/low-level-routines-for-support-vector-machines/</link>
		<comments>http://fseoane.net/blog/2011/low-level-routines-for-support-vector-machines/#comments</comments>
		<pubDate>Wed, 27 Apr 2011 13:27:17 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[scikit-learn]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=546</guid>
		<description><![CDATA[I&#8217;ve been working lately in improving the low-level API of the libsvm bindings in scikit-learn. The goal is to provide an API that encourages an efficient use of these libraries for expert users. These are methods that have lower overhead than the object-oriented interface as they are closer to the C implementation, but do not [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working lately in improving the low-level API of the libsvm bindings in scikit-learn. The goal is to provide an API that encourages an efficient use of these libraries for expert users.</p>
<p>These are methods that have lower overhead than the <a href="http://scikit-learn.sourceforge.net/modules/svm.html">object-oriented interface</a> as they are closer to the C implementation, but do not have an interface as polished. Here, all parameters are expected to be of the correct type, and submitting one of the wrong type will make the function exit immediately with a ValueError. For instance, input data is expected to be of type float64, even for class labels!</p>
<p>Another peculiarity of these methods is that they only take and return numpy arrays. No custom objects, all method take and return arrays. That looks something like:</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #ff7700;font-weight:bold;">import</span> numpy <span style="color: #ff7700;font-weight:bold;">as</span> np<br />
<span style="color: #ff7700;font-weight:bold;">from</span> scikits.<span style="color: black;">learn</span> <span style="color: #ff7700;font-weight:bold;">import</span> svm, datasets<br />
<br />
iris = datasets.<span style="color: black;">load_iris</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
iris.<span style="color: black;">target</span> = iris.<span style="color: black;">target</span>.<span style="color: black;">astype</span><span style="color: black;">&#40;</span>np.<span style="color: black;">float64</span><span style="color: black;">&#41;</span><br />
<br />
learned_params = svm.<span style="color: black;">libsvm</span>.<span style="color: black;">fit</span><span style="color: black;">&#40;</span>iris.<span style="color: black;">data</span>, iris.<span style="color: black;">target</span><span style="color: black;">&#41;</span><br />
pred = svm.<span style="color: black;">libsvm</span>.<span style="color: black;">predict</span><span style="color: black;">&#40;</span>iris.<span style="color: black;">data</span>, <span style="color: #66cc66;">*</span>learned_params<span style="color: black;">&#41;</span></div></div>
<p>Here, I used the fact that the parameters returned by <a href="http://scikit-learn.sourceforge.net/dev/modules/generated/scikits.learn.svm.libsvm.fit.html">libsvm.fit</a> can just passed to <a href="http://scikit-learn.sourceforge.net/dev/modules/generated/scikits.learn.svm.libsvm.predict.html">libsvm.predict</a>. However, any other given parameters should be manually passed to both method.</p>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/low-level-routines-for-support-vector-machines/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>new get_blas_funcs in scipy.linalg</title>
		<link>http://fseoane.net/blog/2011/new-get_blas_funcs-in-scipy-linalg/</link>
		<comments>http://fseoane.net/blog/2011/new-get_blas_funcs-in-scipy-linalg/#comments</comments>
		<pubDate>Sat, 23 Apr 2011 16:24:17 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[scipy]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=587</guid>
		<description><![CDATA[Today got merged some changes I made to function scipy.linalg.get_blas_funcs(). The main enhacement is that get_blas_funcs() now also accepts a single string as input parameter and a dtype, so that fetching the BLAS function for a specific type becomes more natural. For example, fetching the gemm routine for a single-precision complex number now looks like [...]]]></description>
			<content:encoded><![CDATA[<p>Today got merged some changes I made to function scipy.linalg.get_blas_funcs(). The main enhacement is that get_blas_funcs() now also accepts a single string as input parameter and a dtype, so that fetching the BLAS function for a specific type becomes more natural. </p>
<p>For example, fetching the gemm routine for a single-precision complex number now looks like this:</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">gemm = scipy.<span style="color: black;">linalg</span>.<span style="color: black;">get_blas_funcs</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'gemm'</span>, dtype=np.<span style="color: black;">complex64</span><span style="color: black;">&#41;</span></div></div>
<p>compared to the clumsy old syntax:</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">X = np.<span style="color: black;">empty</span><span style="color: black;">&#40;</span><span style="color: #ff4500;">0</span>, dtype=np.<span style="color: black;">complex64</span><span style="color: black;">&#41;</span><br />
gemm, = scipy.<span style="color: black;">linalg</span>.<span style="color: black;">get_blas_funcs</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'gemm'</span>,<span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span>X,<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></div></div>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/new-get_blas_funcs-in-scipy-linalg/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Locally linear embedding and sparse eigensolvers</title>
		<link>http://fseoane.net/blog/2011/locally-linear-embedding-and-sparse-eigensolvers/</link>
		<comments>http://fseoane.net/blog/2011/locally-linear-embedding-and-sparse-eigensolvers/#comments</comments>
		<pubDate>Thu, 21 Apr 2011 12:28:17 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[scikit-learn]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=597</guid>
		<description><![CDATA[I&#8217;ve been working for some time on implementing a locally linear embedding algorithm for the upcoming manifold module in scikit-learn. While several implementations of this algorithm exist in Python, as far as I know none of them is able to use a sparse eigensolver in the last step of the algorithm, falling back to dense [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been working for some time on implementing a <a href="http://www.cs.nyu.edu/~roweis/lle/algorithm.html">locally linear embedding</a> algorithm for the upcoming manifold module in scikit-learn. </p>
<p>While several implementations of this algorithm exist in Python, as far as I know none of them is able to use a sparse eigensolver in the last step of the algorithm, falling back to dense routines causing a huge overhead in this step. </p>
<p>To overcome this, my first implementation used <code>scipy.sparse.linalg.eigsh</code>, which is a sparse eigensolver shipped by scipy and based on ARPACK. However, this approach converged extremely slowly, with timings that exceeded largely those of dense solvers.</p>
<p>Recently I found a way that seems to work reasonably well, with timings that win by a factor of 5 on the swiss roll existing routines. This code is able to solve the problem making use of a preconditioner computed by <a href="http://code.google.com/p/pyamg/">PyAMG</a>.</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #ff7700;font-weight:bold;">import</span> numpy <span style="color: #ff7700;font-weight:bold;">as</span> np<br />
<span style="color: #ff7700;font-weight:bold;">from</span> scipy.<span style="color: black;">sparse</span> <span style="color: #ff7700;font-weight:bold;">import</span> linalg, eye<br />
<span style="color: #ff7700;font-weight:bold;">from</span> pyamg <span style="color: #ff7700;font-weight:bold;">import</span> smoothed_aggregation_solver<br />
<span style="color: #ff7700;font-weight:bold;">from</span> scikits.<span style="color: black;">learn</span> <span style="color: #ff7700;font-weight:bold;">import</span> neighbors<br />
<br />
<span style="color: #ff7700;font-weight:bold;">def</span> locally_linear_embedding<span style="color: black;">&#40;</span>X, n_neighbors, out_dim, tol=1e-6, max_iter=<span style="color: #ff4500;">200</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; W = neighbors.<span style="color: black;">kneighbors_graph</span><span style="color: black;">&#40;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; X, n_neighbors=n_neighbors, mode=<span style="color: #483d8b;">'barycenter'</span><span style="color: black;">&#41;</span><br />
<br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;"># M = (I-W)' (I-W)</span><br />
&nbsp; &nbsp; A = eye<span style="color: black;">&#40;</span><span style="color: #66cc66;">*</span>W.<span style="color: black;">shape</span>, format=W.<span style="color: black;">format</span><span style="color: black;">&#41;</span> - W<br />
&nbsp; &nbsp; A = <span style="color: black;">&#40;</span>A.<span style="color: black;">T</span><span style="color: black;">&#41;</span>.<span style="color: black;">dot</span><span style="color: black;">&#40;</span>A<span style="color: black;">&#41;</span>.<span style="color: black;">tocsr</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
<br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;"># initial approximation to the eigenvectors</span><br />
&nbsp; &nbsp; X = np.<span style="color: #dc143c;">random</span>.<span style="color: black;">rand</span><span style="color: black;">&#40;</span>W.<span style="color: black;">shape</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>, out_dim<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; ml = smoothed_aggregation_solver<span style="color: black;">&#40;</span>A, symmetry=<span style="color: #483d8b;">'symmetric'</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; prec = ml.<span style="color: black;">aspreconditioner</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
<br />
&nbsp; &nbsp; <span style="color: #808080; font-style: italic;"># compute eigenvalues and eigenvectors with LOBPCG</span><br />
&nbsp; &nbsp; eigen_values, eigen_vectors = linalg.<span style="color: black;">lobpcg</span><span style="color: black;">&#40;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; A, X, M=prec, largest=<span style="color: #008000;">False</span>, tol=tol, maxiter=max_iter<span style="color: black;">&#41;</span><br />
<br />
&nbsp; &nbsp; index = np.<span style="color: black;">argsort</span><span style="color: black;">&#40;</span>eigen_values<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> eigen_vectors<span style="color: black;">&#91;</span>:, index<span style="color: black;">&#93;</span>, np.<span style="color: #008000;">sum</span><span style="color: black;">&#40;</span>eigen_values<span style="color: black;">&#41;</span></div></div>
<p>Full code for this algorithm applied to the swiss roll can be found here <a href="https://gist.github.com/934363">here</a>, and I hope it will soon be part of <a href="http://scikit-learn.sourceforge.net/">scikit-learn</a>.</p>
<p><a href="http://fseoane.net/blog/wp-content/uploads/2011/04/lle1.png"><img src="http://fseoane.net/blog/wp-content/uploads/2011/04/lle1-690x1024.png" alt="" title="Locally linear embedding on the swiss roll" width="500" class="aligncenter size-large wp-image-731" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/locally-linear-embedding-and-sparse-eigensolvers/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>scikits.learn is now part of pythonxy</title>
		<link>http://fseoane.net/blog/2011/scikits-learn-is-now-part-of-pythonxy/</link>
		<comments>http://fseoane.net/blog/2011/scikits-learn-is-now-part-of-pythonxy/#comments</comments>
		<pubDate>Wed, 20 Apr 2011 11:48:45 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[scikit-learn]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=694</guid>
		<description><![CDATA[The guys behind pythonxy have been kind enough to add the latest scikit-learn as an additional plugin for their distribution. Having scikit-learn being in both pythonxy and EPD will hopefully make it easier to use for Windows users. For now I will continue to make windows precompiled binaries, but pythonxy users finally have a package [...]]]></description>
			<content:encoded><![CDATA[<p>The guys behind <a href="http://www.pythonxy.com/">pythonxy</a> have been kind enough to add the latest scikit-learn as an <a href="http://code.google.com/p/pythonxy/wiki/AdditionalPlugins">additional plugin</a> for their distribution. Having scikit-learn being in both <a href="http://www.pythonxy.com/">pythonxy</a> and <a href="http://www.enthought.com/products/epd.php">EPD</a> will hopefully make it easier to use for Windows users.</p>
<p><img src="http://fseoane.net/blog/wp-content/uploads/2011/04/pythonxy-logo.png" alt="pythonxy-logo" title="pythonxy-logo" width="161" height="70" class="alignnone size-full wp-image-695" /></p>
<p>For now I will continue to make windows precompiled binaries, but pythonxy users finally have a package that is guaranteed to work with their installation.</p>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/scikits-learn-is-now-part-of-pythonxy/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Least squares with equality constrain</title>
		<link>http://fseoane.net/blog/2011/least-squares-with-equality-constrain/</link>
		<comments>http://fseoane.net/blog/2011/least-squares-with-equality-constrain/#comments</comments>
		<pubDate>Thu, 14 Apr 2011 08:02:10 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[Python]]></category>
		<category><![CDATA[Tecnologí­a]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=621</guid>
		<description><![CDATA[The following algorithm computes the Least squares solution &#124;&#124; Ax &#8211; b&#124;&#124; subject to the equality constrain Bx = d. It&#8217;s a classic algorithm that can be implemented only using a QR decomposition and a least squares solver. This implementation uses numpy and scipy. It makes use of the new linalg.solve_triangular function in scipy 0.9, [...]]]></description>
			<content:encoded><![CDATA[<p>The following algorithm computes the Least squares solution || Ax &#8211; b|| subject to the equality constrain Bx = d. It&#8217;s a classic algorithm that can be implemented only using a QR decomposition and a least squares solver. </p>
<p>This implementation uses numpy and scipy. It makes use of the new linalg.solve_triangular function in scipy 0.9, although degrades to linalg.solve on older versions.</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #ff7700;font-weight:bold;">import</span> numpy <span style="color: #ff7700;font-weight:bold;">as</span> np<br />
<br />
<span style="color: #ff7700;font-weight:bold;">def</span> lse<span style="color: black;">&#40;</span>A, b, B, d, cond=<span style="color: #008000;">None</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; <span style="color: #483d8b;">&quot;&quot;&quot;<br />
&nbsp; &nbsp; Equality-contrained least squares.<br />
<br />
&nbsp; &nbsp; The following algorithm minimizes ||Ax - b|| subject to the<br />
&nbsp; &nbsp; constrain Bx = d.<br />
<br />
&nbsp; &nbsp; Parameters<br />
&nbsp; &nbsp; ----------<br />
&nbsp; &nbsp; A : array-like, shape=[m, n]<br />
<br />
&nbsp; &nbsp; b : array-like, shape=[m]<br />
<br />
&nbsp; &nbsp; B : array-like, shape=[p, n]<br />
<br />
&nbsp; &nbsp; d : array-like, shape=[p]<br />
<br />
&nbsp; &nbsp; cond : float, optional<br />
&nbsp; &nbsp; &nbsp; &nbsp; Cutoff for 'small' singular values; used to determine effective<br />
&nbsp; &nbsp; &nbsp; &nbsp; rank of A. Singular values smaller than<br />
&nbsp; &nbsp; &nbsp; &nbsp; ``rcond * largest_singular_value`` are considered zero.<br />
<br />
&nbsp; &nbsp; Reference<br />
&nbsp; &nbsp; ---------<br />
&nbsp; &nbsp; Matrix Computations, Golub &amp; van Loan, algorithm 12.1.2<br />
<br />
&nbsp; &nbsp; Examples<br />
&nbsp; &nbsp; --------<br />
&nbsp; &nbsp; &gt;&gt;&gt; A, b = [[0, 2, 3], [1, 3, 4.5]], [1, 1]<br />
&nbsp; &nbsp; &gt;&gt;&gt; B, d = [[1, 1, 0]], [1]<br />
&nbsp; &nbsp; &gt;&gt;&gt; lse(A, b, B, d)<br />
&nbsp; &nbsp; array([-0.5 &nbsp; &nbsp; &nbsp; , &nbsp;1.5 &nbsp; &nbsp; &nbsp; , -0.66666667]) &nbsp; &nbsp;<br />
&nbsp; &nbsp; &quot;&quot;&quot;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">from</span> scipy <span style="color: #ff7700;font-weight:bold;">import</span> linalg<br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">if</span> <span style="color: #ff7700;font-weight:bold;">not</span> <span style="color: #008000;">hasattr</span><span style="color: black;">&#40;</span>linalg, <span style="color: #483d8b;">'solve_triangular'</span><span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #808080; font-style: italic;"># compatibility for old scipy</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">def</span> solve_triangular<span style="color: black;">&#40;</span>X, y, <span style="color: #66cc66;">**</span>kwargs<span style="color: black;">&#41;</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> linalg.<span style="color: black;">solve</span><span style="color: black;">&#40;</span>X, y<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">else</span>:<br />
&nbsp; &nbsp; &nbsp; &nbsp; solve_triangular = linalg.<span style="color: black;">solve_triangular</span><br />
&nbsp; &nbsp; A, b, B, d = <span style="color: #008000;">map</span><span style="color: black;">&#40;</span>np.<span style="color: black;">asanyarray</span>, <span style="color: black;">&#40;</span>A, b, B, d<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; p = B.<span style="color: black;">shape</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span><br />
&nbsp; &nbsp; Q, R = linalg.<span style="color: black;">qr</span><span style="color: black;">&#40;</span>B.<span style="color: black;">T</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; y = solve_triangular<span style="color: black;">&#40;</span>R<span style="color: black;">&#91;</span>:p, :p<span style="color: black;">&#93;</span>, d, trans=<span style="color: #483d8b;">'T'</span>, lower=<span style="color: #008000;">False</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; A = np.<span style="color: black;">dot</span><span style="color: black;">&#40;</span>A, Q<span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; z = linalg.<span style="color: black;">lstsq</span><span style="color: black;">&#40;</span>A<span style="color: black;">&#91;</span>:, p:<span style="color: black;">&#93;</span>, b - np.<span style="color: black;">dot</span><span style="color: black;">&#40;</span>A<span style="color: black;">&#91;</span>:, :p<span style="color: black;">&#93;</span>, y<span style="color: black;">&#41;</span>, <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;cond=cond<span style="color: black;">&#41;</span><span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>.<span style="color: black;">ravel</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span><br />
&nbsp; &nbsp; <span style="color: #ff7700;font-weight:bold;">return</span> np.<span style="color: black;">dot</span><span style="color: black;">&#40;</span>Q<span style="color: black;">&#91;</span>:, :p<span style="color: black;">&#93;</span>, y<span style="color: black;">&#41;</span> + np.<span style="color: black;">dot</span><span style="color: black;">&#40;</span>Q<span style="color: black;">&#91;</span>:, p:<span style="color: black;">&#93;</span>, z<span style="color: black;">&#41;</span></div></div>
<p><strong>Update: now scipy has a function qr_multiply which would considerably speed up this code</strong></p>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/least-squares-with-equality-constrain/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>A profiler for Python extensions</title>
		<link>http://fseoane.net/blog/2011/a-profiler-for-python-extensions/</link>
		<comments>http://fseoane.net/blog/2011/a-profiler-for-python-extensions/#comments</comments>
		<pubDate>Wed, 06 Apr 2011 12:02:44 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=531</guid>
		<description><![CDATA[Profiling Python extensions has not been a pleasant experience for me, so I made my own package to do the job. Existing alternatives were either hard to use, forcing you to recompile with custom flags like gprofile or desperately slow like valgrind/callgrind. The package I&#8217;ll talk about is called YEP and is designed to be: [...]]]></description>
			<content:encoded><![CDATA[<p>Profiling Python extensions has not been a pleasant experience for me, so I made my own package to do the job. Existing alternatives were either hard to use, forcing you to recompile with custom flags like gprofile or desperately slow like valgrind/callgrind. The package I&#8217;ll talk about is called <a href='http://pypi.python.org/pypi/yep'>YEP</a> and is designed to be:</p>
<ol>
<li>Unobtrusive: no recompiling, no custom linking. Just lauch &#038; profile.</li>
<li>Fast: waiting sucks.</li>
<li>Easy to use.</li>
</ol>
<h2>Basic usage</h2>
<p>YEP is distributed as a python module and can be <a href='http://pypi.python.org/pypi/yep'>downloaded from the pypi</a>. After installation, it is executed by giving the <b>-m yep</b> flags to the interpreter. Without any arguments, it will just print a help message:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&nbsp; &nbsp; $ python -m yep<br />
Usage: python -m yep [options] scriptfile [arg] ...<br />
&nbsp;...</div></div>
<p>Say you want to profile a script called my_script.py, then the way to quickly get a profiler report is to execute:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&nbsp; &nbsp; $ python -m yep -v my_script.py</div></div>
<p>For example, running YEP on <a href='http://scikit-learn.sourceforge.net/auto_examples/grid_search_digits.html'>this example</a> that makes use of <a href="http://www.csie.ntu.edu.tw/~cjlin/libsvm/">libsvm</a>, a C++ library for Support Vector Machines, outputs</p>
<table style="width:auto;">
<tr>
<td><a href="https://picasaweb.google.com/lh/photo/ltfRg59k-z9Zrk7LBDTUxA?feat=embedwebsite"><img src="https://lh5.googleusercontent.com/_IOBIGAGXP4o/TZruzeuFJjI/AAAAAAAAAGI/JSmxqbOd0o4/s400/Screenshot-fabian%40localhost%3A%20-home-fabian.png" height="238" width="400" /></a></td>
</tr>
<tr>
<td style="font-family:arial,sans-serif; font-size:11px; text-align:right">From <a href="https://picasaweb.google.com/fabian.pedregosa.izquierdo/Screenshots?feat=embedwebsite">Screenshots</a></td>
</tr>
</table>
<p>The last column prints the name of the functions, so just looking at those that start with svm:: gives you an overview of how our libsvm is spending its time.</p>
<h2>Other usages</h2>
<p>Calling YEP without the -v will create a my_script.py.prof file that can be analyzed with pprof (google-pprof on some systems). pprof has a huge range of options, letting you to filter on some funtions, output to ghostview or print a line-by-line profiling, to mention a few. For example, you can generate a call graph with the command:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">&nbsp;$ pprof --gv /usr/bin/python my_script.py.prof</div></div>
<h2>More control</h2>
<p>If you would like to manually start/stop the profiler rather than profile the whole script, you can use the functions yep.start() and yep.stop() inside a python script. This will write the profile to a given filename, so make sure the directory is writable:</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"><span style="color: #ff7700;font-weight:bold;">import</span> yep<br />
yep.<span style="color: black;">start</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'out.prof'</span><span style="color: black;">&#41;</span> <span style="color: #808080; font-style: italic;"># will create an out.prof file</span><br />
<span style="color: #808080; font-style: italic;"># do something ...</span><br />
yep.<span style="color: black;">stop</span><span style="color: black;">&#40;</span><span style="color: black;">&#41;</span></div></div>
<h2>Future work</h2>
<p>The -v option showed at the beginning is just a dirty hack that launches pprof and pipes the output into less. A more robust approach would be to read the resulting profile from python and manipulate it from there, either to std or to <a href='http://docs.python.org/library/profile.html#pstats.Stats'>pstats</a> format. This shouldn&#8217;t be too difficult as the pprof format is described <a href='http://google-perftools.googlecode.com/svn/trunk/doc/cpuprofile-fileformat.html'>here</a></p>
<h2> Acknowledgment</h2>
<p>The original idea to use google-perftools to profile Python extensions was given on this <a href='http://stackoverflow.com/questions/2615153/profiling-python-c-extensions'>Stack overflow question</a></p>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/a-profiler-for-python-extensions/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>scikit-learn coding sprint in Paris</title>
		<link>http://fseoane.net/blog/2011/scikit-learn-coding-sprint-in-paris/</link>
		<comments>http://fseoane.net/blog/2011/scikit-learn-coding-sprint-in-paris/#comments</comments>
		<pubDate>Sat, 02 Apr 2011 10:07:11 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[scikit-learn]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=547</guid>
		<description><![CDATA[Yesterday was the scikit-learn coding sprint in Paris. It was great to meet with old developers (Vincent Michel) and new ones: some of whom I was already familiar with from the mailing list while others came just to say hi and get familiar with the code. It was really great to have people from such [...]]]></description>
			<content:encoded><![CDATA[<p>Yesterday was the scikit-learn coding sprint in Paris. It was great to meet with old developers (Vincent Michel) and new ones: some of whom I was already familiar with from the mailing list while others came just to say hi and get familiar with the code. It was really great to have people from such different backgrounds discuss on concrete problems and getting things done.</p>
<p>A lot of work was done, most of it unmerged yet, but if I had to highlight the three most important for me, that would be the the <a href='https://github.com/scikit-learn/scikit-learn/pull/86'>merge of the hcluster2 branch</a>, the awesome work of <a href='https://github.com/thouis'>thouis</a> in replacing the <a href='https://github.com/scikit-learn/scikit-learn/pull/120'>C++ interface to the ball_tree with a Cython one</a> and suppport for Python3 (not bug-free but imports OK).</p>
<p>As for me, I&#8217;ve been working mostly in providing efficient cross-validatation for Support Vector Machines. The status of this is: low-level API seems to work fine (scikits.learn.svm.libsvm.cross_validation) but high-level API <a href='https://github.com/scikit-learn/scikit-learn/pull/117'>still needs some work</a>.</p>
<p>This is the picture featuring (most) of the people that were at the sprint around 16h in <a href='http://www.logilab.fr/'>Logilab&#8217;s</a> headquarters.</p>
<p><a href="http://www.flickr.com/photos/fseoane/5578952957/" title="IMG_0012 por Fabian Pedregosa, en Flickr"><img src="http://farm6.static.flickr.com/5092/5578952957_27b653d0a4.jpg" width="500" height="375" alt="IMG_0012"></a></p>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/scikit-learn-coding-sprint-in-paris/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>py3k in scikit-learn</title>
		<link>http://fseoane.net/blog/2011/py3k-in-scikit-learn/</link>
		<comments>http://fseoane.net/blog/2011/py3k-in-scikit-learn/#comments</comments>
		<pubDate>Mon, 28 Mar 2011 13:23:46 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[General]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=536</guid>
		<description><![CDATA[One thing I&#8217;d really like to see done in this Friday&#8217;s scikit-learn sprint is to have full support for Python 3. There&#8217;s a branch were the hard word has been done (porting C extensions, automatic 2to3 conversion, etc.), although joblib still has some bugs and no one has attempted to do anything serious with this [...]]]></description>
			<content:encoded><![CDATA[<p>One thing I&#8217;d really like to see done in <a href='http://gael-varoquaux.info/blog/?p=149'>this Friday&#8217;s scikit-learn sprint</a> is to have full support for Python 3.</p>
<p>There&#8217;s <a href='http://github.com/fabianp/scikit-learn/compare/master...py3k'>a branch were the hard word has been done</a> (porting C extensions, automatic 2to3 conversion, etc.), although joblib still has some bugs and no one has attempted to do anything serious with this branch yet &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/py3k-in-scikit-learn/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Computing the vector norm</title>
		<link>http://fseoane.net/blog/2011/computing-the-vector-norm/</link>
		<comments>http://fseoane.net/blog/2011/computing-the-vector-norm/#comments</comments>
		<pubDate>Tue, 15 Feb 2011 08:31:21 +0000</pubDate>
		<dc:creator>fabian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[scipy]]></category>

		<guid isPermaLink="false">http://fseoane.net/blog/?p=440</guid>
		<description><![CDATA[Update: a fast and stable norm was added to scipy.linalg in August 2011 and will be available in scipy 0.10 Last week I discussed with Gael how we should compute the euclidean norm of a vector a using SciPy. Two approaches suggest themselves, either calling scipy.linalg.norm(a) or computing sqrt(a.T a), but as I learned later, [...]]]></description>
			<content:encoded><![CDATA[<p><b>Update: a fast and stable norm was added to scipy.linalg in August 2011 and will be available in scipy 0.10</b></p>
<p>Last week I discussed with <a href="http://gael-varoquaux.info/blog/">Gael</a> how we should compute the euclidean norm of a vector a using SciPy. Two approaches suggest themselves, either calling scipy.linalg.norm(a) or computing sqrt(a.T a), but as I learned later, both suck.</p>
<p><b>Note:</b> I use single-precision arithmetic for simplicity, but similar results hold for double-precision.</p>
<h3>Overflow and underflow</h3>
<p>Both approaches behave terribly in presence of big or small numbers. Take for example an array with a single entry:</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">In <span style="color: black;">&#91;</span><span style="color: #ff4500;">0</span><span style="color: black;">&#93;</span>: a = np.<span style="color: #dc143c;">array</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>1e20<span style="color: black;">&#93;</span>, dtype=np.<span style="color: black;">float32</span><span style="color: black;">&#41;</span><br />
In <span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>: a<br />
Out<span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span>: <span style="color: #dc143c;">array</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span> &nbsp;1.00000002e+20<span style="color: black;">&#93;</span>, dtype=float32<span style="color: black;">&#41;</span><br />
In <span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span>: scipy.<span style="color: black;">linalg</span>.<span style="color: black;">norm</span><span style="color: black;">&#40;</span>a<span style="color: black;">&#41;</span><br />
Out<span style="color: black;">&#91;</span><span style="color: #ff4500;">2</span><span style="color: black;">&#93;</span>: inf<br />
In <span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#93;</span>: np.<span style="color: black;">sqrt</span><span style="color: black;">&#40;</span>np.<span style="color: black;">dot</span><span style="color: black;">&#40;</span>a.<span style="color: black;">T</span>, a<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span><br />
Out<span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#93;</span>: inf</div></div>
<p>That is, both methods return Infinity. However, the correct answer is 10^20, which would comfortably fit in a <a href="http://en.wikipedia.org/wiki/Single_precision_floating-point_format">single-precision</a> instruction. Similar examples can be found where numbers underflow.</p>
<h3>Stability</h3>
<p>Again, scipy.linalg.norm has a terrible behavior in what concerns numerical stability. In presence of different magnitudes severe cancellation can occur. Take for example and array with one 10.000 in the first value and 10.000 ones behind:</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">a = np.<span style="color: #dc143c;">array</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>1e4<span style="color: black;">&#93;</span> + <span style="color: black;">&#91;</span><span style="color: #ff4500;">1</span><span style="color: black;">&#93;</span><span style="color: #66cc66;">*</span><span style="color: #ff4500;">10000</span>, dtype=np.<span style="color: black;">float32</span><span style="color: black;">&#41;</span></div></div>
<p>In this case, scipy.linalg.norm will discard all the ones, producing</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">In <span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#93;</span>: linalg.<span style="color: black;">norm</span><span style="color: black;">&#40;</span>a<span style="color: black;">&#41;</span> - 1e4<br />
Out<span style="color: black;">&#91;</span><span style="color: #ff4500;">3</span><span style="color: black;">&#93;</span>: <span style="color: #ff4500;">0.0</span></div></div>
<p>when the correct answer is 0.5. Here <img src='http://s.wordpress.com/latex.php?latex=%5Csqrt%7Ba%5ET%20a%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\sqrt{a^T a}' title='\sqrt{a^T a}' class='latex' /> has a much nicer behavior since results of a dot-product in single precision are accumulated using double-precision (but if double-precision is used, results won&#8217;t be accumulated using quadruple-precision):</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">In <span style="color: black;">&#91;</span><span style="color: #ff4500;">4</span><span style="color: black;">&#93;</span>: np.<span style="color: black;">sqrt</span><span style="color: black;">&#40;</span>np.<span style="color: black;">dot</span><span style="color: black;">&#40;</span>a.<span style="color: black;">T</span>, a<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span> - 1e4<br />
Out<span style="color: black;">&#91;</span><span style="color: #ff4500;">4</span><span style="color: black;">&#93;</span>: <span style="color: #ff4500;">0.5</span></div></div>
<h3>BLAS BLAS BLAS &#8230;</h3>
<p>The BLAS function <a href='http://www.netlib.org/blas/snrm2.f'>nrm2</a> does automatic scaling of parameters rendering it more stable and tolerant to overflow. Luckily, scipy provides a mechanism to call some BLAS functions:</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">In <span style="color: black;">&#91;</span><span style="color: #ff4500;">5</span><span style="color: black;">&#93;</span>: nrm2, = scipy.<span style="color: black;">linalg</span>.<span style="color: black;">get_blas_funcs</span><span style="color: black;">&#40;</span><span style="color: black;">&#40;</span><span style="color: #483d8b;">'nrm2'</span>,<span style="color: black;">&#41;</span>, <span style="color: black;">&#40;</span>a,<span style="color: black;">&#41;</span><span style="color: black;">&#41;</span></div></div>
<p>Using this function, no overflow occurs (hurray!)</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">In <span style="color: black;">&#91;</span><span style="color: #ff4500;">95</span><span style="color: black;">&#93;</span>: a = np.<span style="color: #dc143c;">array</span><span style="color: black;">&#40;</span><span style="color: black;">&#91;</span>1e20<span style="color: black;">&#93;</span>, dtype=np.<span style="color: black;">float32</span><span style="color: black;">&#41;</span><br />
In <span style="color: black;">&#91;</span><span style="color: #ff4500;">96</span><span style="color: black;">&#93;</span>: nrm2<span style="color: black;">&#40;</span>a<span style="color: black;">&#41;</span><br />
Out<span style="color: black;">&#91;</span><span style="color: #ff4500;">96</span><span style="color: black;">&#93;</span>: 1.0000000200408773e+20</div></div>
<p>and stability is greatly improved</p>
<div class="codecolorer-container python default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:435px;"><div class="python codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">In <span style="color: black;">&#91;</span><span style="color: #ff4500;">99</span><span style="color: black;">&#93;</span>: nrm2<span style="color: black;">&#40;</span>a<span style="color: black;">&#41;</span> - 1e4<br />
Out<span style="color: black;">&#91;</span><span style="color: #ff4500;">99</span><span style="color: black;">&#93;</span>: <span style="color: #ff4500;">0.49998750062513864</span></div></div>
<h3>Timing</h3>
<p>Computing the 2-norm of an array is a very cheap operation, thus computations are usually dominated by external factors, such as latency of memory access or overhead in the Python/C layer. Experimental benchmarks on an array of size 10^7 show that nrm2 is marginally slower than <img src='http://s.wordpress.com/latex.php?latex=%5Csqrt%7Ba%5ET%20a%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\sqrt{a^T a}' title='\sqrt{a^T a}' class='latex' />, because scaling has a cost, but is is also more stable and less prone to overflow and underflow. It also shows that scipy.linalg.norm is the slowest (and numerically worst!) of all.</p>
<table border="1">
<tr>
<td><img src='http://s.wordpress.com/latex.php?latex=%5Csqrt%7Ba%5ET%20a%7D&#038;bg=ffffff&#038;fg=000000&#038;s=0' alt='\sqrt{a^T a}' title='\sqrt{a^T a}' class='latex' /></td>
<td>BLAS nrm2(a)</td>
<td>scipy.linalg.norm(a)</td>
</tr>
<tr>
<td>0.02</td>
<td>0.02</td>
<td>0.16</td>
</tr>
</table>
]]></content:encoded>
			<wfw:commentRss>http://fseoane.net/blog/2011/computing-the-vector-norm/feed/</wfw:commentRss>
		<slash:comments>13</slash:comments>
		</item>
	</channel>
</rss>

