Archive for the 'General' Category

Low rank approximation

Sunday, November 6th, 2011

A little experiment to see what low rank approximation looks like. These are the best rank-k approximations (in the Frobenius norm) to the a natural image for increasing values of k and an original image of rank 512. Python code can be found here. GIF animation made using ImageMagic’s convert script.

scikit-learn 0.9

Sunday, October 2nd, 2011

Last week we released a new version of scikit-learn. The Changelog is particularly impressive, yet personally this release is important for other reasons. This will probably be my last release as a paid engineer. I’m starting a PhD next month, and although I plan to continue contributing to the project and make a few more [...]

scikit-learn’s EuroScipy 2011 coding sprint — day two

Thursday, August 25th, 2011

Today’s coding sprint was a bit more crowded, with some notable scipy hackers such as Ralph Gommers, Stefan van der Walt, David Cournapeau or Fernando Perez from Ipython joining in. On what got done: – We merged Jake‘s new BallTree code. This is a pure Cython implementation of a nearest-neighbor search similar to the KDTree [...]

scikit-learn’s EuroScipy 2011 coding sprint — day one

Tuesday, August 23rd, 2011

As a warm-up for the upcoming EuroScipy conference, some of the scikit-learn developers decided to gather and work together for a couple of days. Today was the first day and there was only a handfull of us, as the real kickoff is expected tomorrow. Some interesting coding happened, although most of us where still preparing [...]

Ridge regression path

Tuesday, July 12th, 2011

Ridge coefficients for multiple values of the regularization parameter can be elegantly computed by updating the thin SVD decomposition of the design matrix: import numpy as np from scipy import linalg def ridge(A, b, alphas):     """Return coefficients for regularized least squares                min ||A x – b|| [...]

LLE comes in different flavours

Thursday, June 30th, 2011

I haven’t worked in the manifold module since last time, yet thanks to Jake VanderPlas there are some cool features I can talk about. First of, the ARPACK backend is finally working and gives factor one speedup over the lobcpg + PyAMG approach. The key is to use ARPACK’s shift-invert mode instead of the regular [...]

Handwritten digits and Locally Linear Embedding

Wednesday, May 4th, 2011

I decided to test my new Locally Linear Embedding (LLE) implementation against a real dataset. At first I didn’t think this would turn out very well, since LLE seems to be somewhat fragile, yielding largely different results for small differences in parameters such as number of neighbors or tolerance, but as it turns out, results [...]

Low-level routines for Support Vector Machines

Wednesday, April 27th, 2011

I’ve been working lately in improving the low-level API of the libsvm bindings in scikit-learn. The goal is to provide an API that encourages an efficient use of these libraries for expert users. These are methods that have lower overhead than the object-oriented interface as they are closer to the C implementation, but do not [...]

new get_blas_funcs in scipy.linalg

Saturday, April 23rd, 2011

Today got merged some changes I made to function scipy.linalg.get_blas_funcs(). The main enhacement is that get_blas_funcs() now also accepts a single string as input parameter and a dtype, so that fetching the BLAS function for a specific type becomes more natural. For example, fetching the gemm routine for a single-precision complex number now looks like [...]

Locally linear embedding and sparse eigensolvers

Thursday, April 21st, 2011

I’ve been working for some time on implementing a locally linear embedding algorithm for the upcoming manifold module in scikit-learn. While several implementations of this algorithm exist in Python, as far as I know none of them is able to use a sparse eigensolver in the last step of the algorithm, falling back to dense [...]

scikits.learn is now part of pythonxy

Wednesday, April 20th, 2011

The guys behind pythonxy have been kind enough to add the latest scikit-learn as an additional plugin for their distribution. Having scikit-learn being in both pythonxy and EPD will hopefully make it easier to use for Windows users. For now I will continue to make windows precompiled binaries, but pythonxy users finally have a package [...]

A profiler for Python extensions

Wednesday, April 6th, 2011

Profiling Python extensions has not been a pleasant experience for me, so I made my own package to do the job. Existing alternatives were either hard to use, forcing you to recompile with custom flags like gprofile or desperately slow like valgrind/callgrind. The package I’ll talk about is called YEP and is designed to be: [...]

scikit-learn coding sprint in Paris

Saturday, April 2nd, 2011

Yesterday was the scikit-learn coding sprint in Paris. It was great to meet with old developers (Vincent Michel) and new ones: some of whom I was already familiar with from the mailing list while others came just to say hi and get familiar with the code. It was really great to have people from such [...]

py3k in scikit-learn

Monday, March 28th, 2011

One thing I’d really like to see done in this Friday’s scikit-learn sprint is to have full support for Python 3. There’s a branch were the hard word has been done (porting C extensions, automatic 2to3 conversion, etc.), although joblib still has some bugs and no one has attempted to do anything serious with this [...]

Computing the vector norm

Tuesday, February 15th, 2011

Update: a fast and stable norm was added to scipy.linalg in August 2011 and will be available in scipy 0.10 Last week I discussed with Gael how we should compute the euclidean norm of a vector a using SciPy. Two approaches suggest themselves, either calling scipy.linalg.norm(a) or computing sqrt(a.T a), but as I learned later, [...]

Smells like hacker spirit

Friday, February 11th, 2011

I was last weekend in FOSDEM presenting scikits.learn (here are the slides I used at the Data Analytics Devroom). Kudos to Olivier Grisel and all the people who organized such a fun and authentic meeting!

New examples in scikits.learn 0.6

Friday, December 31st, 2010

Latest release of scikits.learn comes with an awesome collection of examples. These are some of my favorites: Faces recognition This example by Olivier Grisel, downloads a 58MB faces dataset from Labeled Faces in the Wild, and is able to perform PCA for feature extraction and SVC for classification, yielding a very acceptable 0.85 f1-score. Species [...]

memory efficient bindigs for libsvm

Friday, November 19th, 2010

scikits.learn.svm now uses LibSVM-dense instead of LibSVM for some support vector machine related algorithms when input is a dense matrix. As a result most of the copies associated with argument passing are avoided, giving 50% less memory footprint and several times less than the python bindings that ship with libsvm, which stores data in the [...]

LARS algorithm

Thursday, September 30th, 2010

I’ve been working lately with Alexandre Gramfort coding the LARS algorithm in scikits.learn. This algorithm computes the solution to several general linear models used in machine learning: LAR, Lasso, Elasticnet and Forward Stagewise. Unlike the implementation by coordinate descent, the LARS algorithm gives the full coefficient path along the regularization parameter, and thus it is [...]

Support for sparse matrices in scikits.learn

Monday, August 23rd, 2010

I recently added support for sparse matrices (as defined in scipy.sparse) in some classifiers of scikits.learn. In those classes, the fit method will perform the algorithm without converting to a dense representation and will also store parameters in an efficient format. Right now, the only classese that implements this is SVC and LinearSVC in scikits.learn.svm.sparse, [...]