Archive for the 'General' Category
Sunday, November 6th, 2011
A little experiment to see what low rank approximation looks like. These are the best rank-k approximations (in the Frobenius norm) to the a natural image for increasing values of k and an original image of rank 512. Python code can be found here. GIF animation made using ImageMagic’s convert script.
General | Comments (1)
Sunday, October 2nd, 2011
Last week we released a new version of scikit-learn. The Changelog is particularly impressive, yet personally this release is important for other reasons. This will probably be my last release as a paid engineer. I’m starting a PhD next month, and although I plan to continue contributing to the project and make a few more [...]
General, scikit-learn | Comments Off
Thursday, August 25th, 2011
Today’s coding sprint was a bit more crowded, with some notable scipy hackers such as Ralph Gommers, Stefan van der Walt, David Cournapeau or Fernando Perez from Ipython joining in. On what got done: – We merged Jake‘s new BallTree code. This is a pure Cython implementation of a nearest-neighbor search similar to the KDTree [...]
General, Python, scikit-learn | Comments (1)
Tuesday, August 23rd, 2011
As a warm-up for the upcoming EuroScipy conference, some of the scikit-learn developers decided to gather and work together for a couple of days. Today was the first day and there was only a handfull of us, as the real kickoff is expected tomorrow. Some interesting coding happened, although most of us where still preparing [...]
General, scikit-learn | Comments Off
Tuesday, July 12th, 2011
Ridge coefficients for multiple values of the regularization parameter can be elegantly computed by updating the thin SVD decomposition of the design matrix: import numpy as np from scipy import linalg def ridge(A, b, alphas): """Return coefficients for regularized least squares min ||A x – b|| [...]
General, scikit-learn, scipy | Comments (5)
Thursday, June 30th, 2011
I haven’t worked in the manifold module since last time, yet thanks to Jake VanderPlas there are some cool features I can talk about. First of, the ARPACK backend is finally working and gives factor one speedup over the lobcpg + PyAMG approach. The key is to use ARPACK’s shift-invert mode instead of the regular [...]
General, manifold learning, scikit-learn | Comments (7)
Wednesday, May 4th, 2011
I decided to test my new Locally Linear Embedding (LLE) implementation against a real dataset. At first I didn’t think this would turn out very well, since LLE seems to be somewhat fragile, yielding largely different results for small differences in parameters such as number of neighbors or tolerance, but as it turns out, results [...]
General, Python, scikit-learn | Comments Off
Wednesday, April 27th, 2011
I’ve been working lately in improving the low-level API of the libsvm bindings in scikit-learn. The goal is to provide an API that encourages an efficient use of these libraries for expert users. These are methods that have lower overhead than the object-oriented interface as they are closer to the C implementation, but do not [...]
General, Python, scikit-learn | Comments (3)
Saturday, April 23rd, 2011
Today got merged some changes I made to function scipy.linalg.get_blas_funcs(). The main enhacement is that get_blas_funcs() now also accepts a single string as input parameter and a dtype, so that fetching the BLAS function for a specific type becomes more natural. For example, fetching the gemm routine for a single-precision complex number now looks like [...]
General, Python, scipy | Comments (1)
Thursday, April 21st, 2011
I’ve been working for some time on implementing a locally linear embedding algorithm for the upcoming manifold module in scikit-learn. While several implementations of this algorithm exist in Python, as far as I know none of them is able to use a sparse eigensolver in the last step of the algorithm, falling back to dense [...]
General, Python, scikit-learn | Comments (7)
Wednesday, April 20th, 2011
The guys behind pythonxy have been kind enough to add the latest scikit-learn as an additional plugin for their distribution. Having scikit-learn being in both pythonxy and EPD will hopefully make it easier to use for Windows users. For now I will continue to make windows precompiled binaries, but pythonxy users finally have a package [...]
General, Python, scikit-learn | Comments Off
Wednesday, April 6th, 2011
Profiling Python extensions has not been a pleasant experience for me, so I made my own package to do the job. Existing alternatives were either hard to use, forcing you to recompile with custom flags like gprofile or desperately slow like valgrind/callgrind. The package I’ll talk about is called YEP and is designed to be: [...]
General, Python | Comments (8)
Saturday, April 2nd, 2011
Yesterday was the scikit-learn coding sprint in Paris. It was great to meet with old developers (Vincent Michel) and new ones: some of whom I was already familiar with from the mailing list while others came just to say hi and get familiar with the code. It was really great to have people from such [...]
General, scikit-learn | Comments Off
Monday, March 28th, 2011
One thing I’d really like to see done in this Friday’s scikit-learn sprint is to have full support for Python 3. There’s a branch were the hard word has been done (porting C extensions, automatic 2to3 conversion, etc.), although joblib still has some bugs and no one has attempted to do anything serious with this [...]
General | Comments Off
Tuesday, February 15th, 2011
Update: a fast and stable norm was added to scipy.linalg in August 2011 and will be available in scipy 0.10 Last week I discussed with Gael how we should compute the euclidean norm of a vector a using SciPy. Two approaches suggest themselves, either calling scipy.linalg.norm(a) or computing sqrt(a.T a), but as I learned later, [...]
General, scipy | Comments (13)
Friday, February 11th, 2011
I was last weekend in FOSDEM presenting scikits.learn (here are the slides I used at the Data Analytics Devroom). Kudos to Olivier Grisel and all the people who organized such a fun and authentic meeting!
General, Tecnología | Comments Off
Friday, December 31st, 2010
Latest release of scikits.learn comes with an awesome collection of examples. These are some of my favorites: Faces recognition This example by Olivier Grisel, downloads a 58MB faces dataset from Labeled Faces in the Wild, and is able to perform PCA for feature extraction and SVC for classification, yielding a very acceptable 0.85 f1-score. Species [...]
General, scikit-learn, Tecnología | Comments Off
Friday, November 19th, 2010
scikits.learn.svm now uses LibSVM-dense instead of LibSVM for some support vector machine related algorithms when input is a dense matrix. As a result most of the copies associated with argument passing are avoided, giving 50% less memory footprint and several times less than the python bindings that ship with libsvm, which stores data in the [...]
General, scikit-learn | Comments Off
Thursday, September 30th, 2010
I’ve been working lately with Alexandre Gramfort coding the LARS algorithm in scikits.learn. This algorithm computes the solution to several general linear models used in machine learning: LAR, Lasso, Elasticnet and Forward Stagewise. Unlike the implementation by coordinate descent, the LARS algorithm gives the full coefficient path along the regularization parameter, and thus it is [...]
General, scikit-learn, Tecnología | Comments (4)
Monday, August 23rd, 2010
I recently added support for sparse matrices (as defined in scipy.sparse) in some classifiers of scikits.learn. In those classes, the fit method will perform the algorithm without converting to a dense representation and will also store parameters in an efficient format. Right now, the only classese that implements this is SVC and LinearSVC in scikits.learn.svm.sparse, [...]
General | Comments Off