Sunday, December 23, 2007

The end of the SciPy sandbox

On October 2, 2005, Travis created the newscipy branch to port SciPy to what is now called NumPy. Three days later scipy.sandbox was created. The sandbox was originally intended to be a staging ground for packages that were undergoing rapid development and whose APIs were in flux. It was also a place where broken code could live.

The sandbox is currently creating more problems than it solves:
  1. Sandbox code limits group development, since it is often viewed as a place where a specific developer (or maybe a small group of developers) is experimenting. In fact, several of the packages are simply named after the developer. And branching would be a more appropriate way for experimental work done by a small group of developers.
  2. The ambiguous nature of the sandbox (i.e., in the SciPy trunk, but not in the release) plus a greater tolerance for broken code allows loose coding and documentation standards, which creates a barrier to inclusion in the core.
  3. Having packages included in the trunk implies that the code will eventually move into official releases; but several of the packages (e.g., old graphics packages) will not be included in future releases.
  4. Finally and most importantly, the sandbox leads to confusion and installation headaches. Users expect to have access to sandbox packages when they install SciPy binaries. But if they want to use a sandbox package, they are encouraged to download the source code, edit configuration files, and build a SciPy.
At the recent meeting in Berkeley, it was unanimously agreed upon that we should get rid of the sandbox for the SciPy 0.7.0 release, which is planned for late March. By the 0.7.0 release, the existing sandbox packages will either need to be officially moved into scipy, made into a scikit, moved into a branch, or simply deleted.

Eventually, we would like to see all of the following code/packages/functionality moved into scipy: arpack, buildgrid, constants, delaunay, ga, image, lowbpcg, montecarlo, netcdf, newoptimize, rbf, rkern, and spline. Most of this code will not likely be ready by the 0.7.0 release, so it will probably just be moved into a branch for now.

We would like to see all of the following code/packages/functionality moved into a scikit: ann, exmplpackage, fdfpack, multigrid, pyem, pyloess, svm, and timeseries. My next blog entry will be focused on Scikits.

The packages belonging to specific developers should probably be moved into a branch: cdavid, duard, oliphant, and rkern. And some of the developers are suggesting that numexpr be moved into a separate, stand-alone package. Finally, several packages can just be deleted: arraysetops, cow, gplt, maskedarray, plt, stats, wavelets, and xplt.

No comments: