Archive for the 'scikit-learn' Category
Sunday, October 2nd, 2011
Last week we released a new version of scikit-learn. The Changelog is particularly impressive, yet personally this release is important for other reasons. This will probably be my last release as a paid engineer. I’m starting a PhD next month, and although I plan to continue contributing to the project and make a few more [...]
General, scikit-learn | Comments Off
Sunday, September 4th, 2011
I’ve been working lately in improving the scikit-learn example gallery to show also a small thumbnail of the plotted result. Here is what the gallery looks like now And the real thing should be already displayed in the development documentation. The next thing is to add a static image to those that don’t generate any [...]
scikit-learn | Comments (1)
Thursday, August 25th, 2011
Today’s coding sprint was a bit more crowded, with some notable scipy hackers such as Ralph Gommers, Stefan van der Walt, David Cournapeau or Fernando Perez from Ipython joining in. On what got done: – We merged Jake‘s new BallTree code. This is a pure Cython implementation of a nearest-neighbor search similar to the KDTree [...]
General, Python, scikit-learn | Comments (1)
Tuesday, August 23rd, 2011
As a warm-up for the upcoming EuroScipy conference, some of the scikit-learn developers decided to gather and work together for a couple of days. Today was the first day and there was only a handfull of us, as the real kickoff is expected tomorrow. Some interesting coding happened, although most of us where still preparing [...]
General, scikit-learn | Comments Off
Tuesday, July 12th, 2011
Ridge coefficients for multiple values of the regularization parameter can be elegantly computed by updating the thin SVD decomposition of the design matrix: import numpy as np from scipy import linalg def ridge(A, b, alphas): """Return coefficients for regularized least squares min ||A x – b|| [...]
General, scikit-learn, scipy | Comments (5)
Thursday, June 30th, 2011
I haven’t worked in the manifold module since last time, yet thanks to Jake VanderPlas there are some cool features I can talk about. First of, the ARPACK backend is finally working and gives factor one speedup over the lobcpg + PyAMG approach. The key is to use ARPACK’s shift-invert mode instead of the regular [...]
General, manifold learning, scikit-learn | Comments (7)
Tuesday, June 7th, 2011
The manifold module in scikit-learn is slowly progressing: the locally linear embedding implementation was finally merged along with some documentation. At about the same time but in a different timezone, Jake VanderPlas began coding other manifold learning methods and back in Paris Olivier Grisel made my digits example a lot nicer by adding the embedding [...]
scikit-learn | Comments Off
Wednesday, May 4th, 2011
I decided to test my new Locally Linear Embedding (LLE) implementation against a real dataset. At first I didn’t think this would turn out very well, since LLE seems to be somewhat fragile, yielding largely different results for small differences in parameters such as number of neighbors or tolerance, but as it turns out, results [...]
General, Python, scikit-learn | Comments Off
Wednesday, April 27th, 2011
I’ve been working lately in improving the low-level API of the libsvm bindings in scikit-learn. The goal is to provide an API that encourages an efficient use of these libraries for expert users. These are methods that have lower overhead than the object-oriented interface as they are closer to the C implementation, but do not [...]
General, Python, scikit-learn | Comments (3)
Thursday, April 21st, 2011
I’ve been working for some time on implementing a locally linear embedding algorithm for the upcoming manifold module in scikit-learn. While several implementations of this algorithm exist in Python, as far as I know none of them is able to use a sparse eigensolver in the last step of the algorithm, falling back to dense [...]
General, Python, scikit-learn | Comments (7)
Wednesday, April 20th, 2011
The guys behind pythonxy have been kind enough to add the latest scikit-learn as an additional plugin for their distribution. Having scikit-learn being in both pythonxy and EPD will hopefully make it easier to use for Windows users. For now I will continue to make windows precompiled binaries, but pythonxy users finally have a package [...]
General, Python, scikit-learn | Comments Off
Saturday, April 2nd, 2011
Yesterday was the scikit-learn coding sprint in Paris. It was great to meet with old developers (Vincent Michel) and new ones: some of whom I was already familiar with from the mailing list while others came just to say hi and get familiar with the code. It was really great to have people from such [...]
General, scikit-learn | Comments Off
Friday, December 31st, 2010
Latest release of scikits.learn comes with an awesome collection of examples. These are some of my favorites: Faces recognition This example by Olivier Grisel, downloads a 58MB faces dataset from Labeled Faces in the Wild, and is able to perform PCA for feature extraction and SVC for classification, yielding a very acceptable 0.85 f1-score. Species [...]
General, scikit-learn, Tecnología | Comments Off
Monday, November 29th, 2010
Based on the work of libsvm-dense by Ming-Wei Chang, Hsuan-Tien Lin, Ming-Hen Tsai, Chia-Hua Ho and Hsiang-Fu Yu I patched the libsvm distribution shipped with scikits.learn to allow setting weights for individual instances. The motivation behind this is to be able force a classifier to focus its attention in some samples instead of others. This [...]
scikit-learn, Tecnología | Comments (1)
Wednesday, November 24th, 2010
Highlights for this release: * New stochastic gradient descent module by Peter Prettenhofer * Improved svm module: memory efficiency, automatic class weights. * Wrap for liblinear’s Multi-class SVC (option multi_class in LinearSVC) * New features and performance improvements of text feature extraction. * Improved sparse matrix support, both in main classes (GridSearch) as in sparse [...]
scikit-learn, Tecnología | Comments (1)
Friday, November 19th, 2010
scikits.learn.svm now uses LibSVM-dense instead of LibSVM for some support vector machine related algorithms when input is a dense matrix. As a result most of the copies associated with argument passing are avoided, giving 50% less memory footprint and several times less than the python bindings that ship with libsvm, which stores data in the [...]
General, scikit-learn | Comments Off
Thursday, September 30th, 2010
I’ve been working lately with Alexandre Gramfort coding the LARS algorithm in scikits.learn. This algorithm computes the solution to several general linear models used in machine learning: LAR, Lasso, Elasticnet and Forward Stagewise. Unlike the implementation by coordinate descent, the LARS algorithm gives the full coefficient path along the regularization parameter, and thus it is [...]
General, scikit-learn, Tecnología | Comments (4)
Sunday, September 12th, 2010
Las week took place in Paris the second scikits.learn sprint. It was two days of insane activity (115 commits, 6 branches, 33 coffees) in which we did a lot of work, both implementing new algorithms and fixing or improving old ones. This includes: * sparse version of Lasso by coordinate descent. Not (yet) merged into [...]
scikit-learn | Comments Off
Thursday, May 27th, 2010
It is now possible (using the development version as of may 2010) to use Support Vector Machines with custom kernels in scikits.learn. How to use it couldn’t be more simple: you just pass a callable (the kernel) to the class constructor). For example, a linear kernel would be implemented as follows: import numpy as np [...]
General, scikit-learn, Tecnología | Comments (1)
Wednesday, March 17th, 2010
Suppose some given data points each belong to one of two classes, and the goal is to decide which class a new data point will be in. In the case of support vector machines, a data point is viewed as a p-dimensional vector (2-dimensional in this example), and we want to know whether we can [...]
General, scikit-learn, Tecnología | Comments (4)