Skip to main content

Statistical Computing

Paraphrasing Carlo Lauro [1], Statistical computing is the application of computer science to statistics; subsequently, computational statistics is the design of algorithms for implementing statistical methods. For the purposes of this page the two terms will be considered to carry the same intuitive meaning and if any distinctions are to be drawn they will be stated explicitly.

The most widely (academically) used environment to implement Stat.Computing is R. Many statisticians have already amassed excellent collections of R material and the following by Karl Broman, Introduction to R, probably will keep anyone busy for quite a while. Nevertheless if you are fortunate enough to read and comprehend greek the (self-proclaimed "πρόχειρες") notes by Konstantinos Fokianos, Εισαγωγή στην R, are the best resource in greek I have encountered. If despite that you feel that you really need a book in order to feel ready to get your hands dirty with R, Peter Introductory Statistics with R offers a concise and handy introduction to R (and to Statistics in that matter). In case you finally feel a bit motivated to see what is happening under the hood, Simon Wood's lecture notes from APTS 2011 take a statistician's look in the numerical analysis methods that make the nuts and bolts of modern Stat.Computing environments.

For people with the need for speed unfortunately no widely used free and open source software environment exists; that means you will have to probably write you own code in your language of preference. Assuming that this will be C/C++, good starting points are GSL and Eigen. The NAG libraries offer a splendid closed source and quite expensive alternative. While GSL provides a number of solver routines GSL doesn't have any good non-linear, derivative-free in-built solver. For that I recommend NLopt which has a less than optimal documentation, but syntax-wise is quite easy to use and offers a variety of algorithms to experiment with. Where GSL and other libraries might fail to provide some solutions the book Numerical Recipes provides disputed, non-free but yet commonly used implementations of hundreds of useful algorithms.

Reviewing one less commonly widely used program, Gretl is an excellent econometric package offering a nice GTK+ GUI and a variety of readily available and not so common statistical tests (especially regarding time-series analysis). As an intermediate between C and R (both in terms of speed and code readability), the packages Numpy and Scipy offer a variety of computational routines that can be used through basic Python scripts to "torture" your data.







1. Jaromír Antoch, Environment for statistical computing, Computer Science Review 2 (2008) 113 – 122