RSS News Reports 2005/6
AGM and Designing a Design of Experiments Website, by Rodney Edmondson and Richard Reader
In the seminar, Rodney presented a web-based software project for Design of Experiments (DOE). The project has been developed by Rodney Edmonson and Richard Reader, with support from Steven Gilmour from Queen Mary Univ. of London. The aim of the project is to give simple but practical software to a wide community of practitioners.
Rodney started by giving some background into DOE. He then emphasized the importance of good, uncomplicated design. He made his point clear thourgh a series of examples, mostly taken from agriculture. One of his examples showed the effect of temperature in curd formation in a crop. Another example exhibited yield as a response in terms of crop spacing along seasons of the year. A third example used a two-fifths fraction of a 5^2 factorial experiment to study climate change. Along the examples Rodney mentioned that often factorial experiments and response surface designs work best. He also emphasized the usefulness of block designs as an efficient way of controlling natural variability.
Then he proceeded to discuss the motivation behind the web-based software. Even if such a project might seem difficult to set up, mantain, protect and develop, its advantages outweigh the disvadvantages. Being on the web, its visibility and accesibility are obvious, and thus it is possible to reach a wider forum. Rodney showed some live examples using the sofware in the computer.
The project currently is running in two parts, both of which use R as the inner processing language for the design algorithms. The first and currently main part runs as an internet site using ASP with R executed via batch files generated by the site software. The second version, not yet 'live', runs JSP within a Struts framework on a Linux platform which allows R to run directly via an R web server called Rserve. The web software helps the user to create designs through a series of drop down menus and the program currently uses powerful non-trivial R algorithms to generate the block designs. As an additional tool for the user, the software will do a dummy analysis of variance of the output, using GenStat. Finally, design details can be downloaded for use by any standard statistical analysis package. Rodney finished the seminar by pointing to the future of the project. It is intended that the Linux development will eventually substitute the current version. This will enable the widest availability and best speed. There are also plans to develop a downloadable version as a complement to the internet version. Also, the availability of response surface designs will be extended in the future by using R-based algorithms to generate a range of flexible and efficient response surface designs.
The software is still under development but the current work can be accessed at the URL: http://biometrics.hri.ac.uk/experimentaldesigns/website/hri.htm
10th November: 60th Anniversary
Two Talks and a Party: Tim Holt and Peter Green
The Birmingham and district group celebrated its 60th Anniversary in November with a double bill of talks by the current president of the Society Tim Holt and the earlier president Peter Green. After introducing the speakers, the chair of the group, Tony Lawrance, welcomed former chairs David Spurrell, Geoff Freeman, John Copas, Tim Marshall, David Goda, Jane Reeves and John Wilkin, and thanked them for their service to the group.
Tim Holt’s talk was entitled Estimating regional and global indicators of development and was based on work for the United Nations. Tim started by giving some political background on indicators of development as set out by institutions like the United Nations (for example, the first millennium goal: to eradicate extreme poverty and hunger). He discussed their uses and the availability of data for their measurement. One such indicator mentioned was the proportion of economically active women and two of the uses were the monitoring of development and as a major driver for economic and social progress.
Tim emphasised the imperfections of available data across countries. Most time series data measuring regional and global indicators are sparse, in the sense that they contain a lot of missing values; in war-affected areas they may all be missing. Imputation methods have to be employed and these have to be simple and plausible if they are to be politically acceptable.
In almost all cases there are relatively complete data for some countries in each region and so a regional multi-level model can be fitted to the data from that region, borrowing strength from the intercepts and slopes of the other regions. A weighted regional aggregate can then be obtained. The rationale behind this is that countries in the same region tend to have similar behaviour. There might be bias due to imputation and model mis-specification; the use of sensitivity analysis to assess this was mentioned. However, important methodological issues remain concerning estimates of regional and global measures of development, and Tim noted they need more attention from a statistical research point of view. Audience discussion drew out some of the possibilities and acknowledged the challenging difficulties of analyzing such incomplete data on regional and global scales.
Peter Green’s talk had the enigmatic title Matching and alignment but this soon became clear with his motivating question ‘Can you find subsets of two or more given configurations of points that match, apart from measurement error, when the configurations have been subject to different unknown geometrical transformations?’ Peter explained that an important problem in shape analysis is to match configurations of points in space by inferring the geometrical transformation mapping one to the other; his own interest in the problem arose from a presentation by Kanti Mardia. This prompted joint work on a model which addressed configurations of points that are unlabelled, or have at most a partial labelling constraining the matching, and in which some points may only appear in one of the configurations.
The model includes transformation and is based on a Poisson process for hidden true point locations; this leads to mathematical simplification and efficiency of implementation. Peter presented a procedure using a Bayesian approach which enabled simultaneous inference about the matching and the transformation. He then discussed how the model could be used in an application which consisted of aligning active sites of proteins from raw data, citing his website http://www.stats.bris.ac.uk/~peter/Align/index.html for more information on the application. The subsequent discussion concerned some of the omitted technical background and the elegance of the model.
After these statistical celebrations, the audience adjourned to the Warwick statistics department for more conventional anniversary celebrations. In historical vein, Tim Holt talked about the early days of the Government Statistical Service; it was formed during the Second World War when prime minister Churchill received confusingly different accounts of strategic statistical information from different departments. To counter this he ordered that a Statistical Service be set up to give him understandable assessments.
Tony Lawrance then noted that in a way this led to the formation of the Birmingham RSS group as the continuation of a Quality and Control Panel in Birmingham of the Ministries of Production and Supply. For several years, the group’s programme was dominated by industrial applications, partly by work in the USA; a glimpse back to the era is evoked by the talk title Impressions of statistics in America on the 27 February 1946 by W A Bennett of the English Needle Co and M Milbourn of ICI.
The celebrations concluded with Peter Green proposing a toast to the group and jointly with Tim Holt cutting an appropriately decorated cake, as depicted in the pictures.
Thanks are due to Erick Lekone and Claudia Lazada-Can for notes taken during the talks.Report by Tony Lawrance
Use and abuse of statistics in parliament, by Richard Cracknell
The last meeting of 2005 of the RSS Birmingham & District group was held in the University of Warwick on Thursday 8th December. Before a large audience, chairman Tony Lawrence presented a slideshow with the photos of the 60th Anniversary Meeting, and launched the group website. Tony then introduced Richard J. Cracknell for the talk “Statistics in Parliament: Use and Abuse”.
Richard heads the Social Statistics Section of the House of Commons Library. The function of the Library is to research, collect and report information to the members of parliament (MPs). MPs use statistics as part of their job. However, this use is always linked with the agenda of the MP, hence the unanimous suspicion of the general public of it.
Richard proceeded with an example from a recent case. Early in the year there was a heated parliament debate centred on the performance of the National Health Service (NHS) under Labour government. Richard showed how easily is to be misled by statistics. If we consider the difference of last minute cancellations (for patients on the shortlist for operations) between 04/05 and 95/96, the result is positive, showing a degraded service; however, between 04/05 and 02/03 we have a negative result, showing improvement of the service for the same data! The story behind this apparent contradiction is that endpoint data can hide a lot of information. Other cases made his point clear.
The different actors of the political arena need reliable statistics, and part of the role of a statistician is to improve understanding and confidence in the subject. Richard finished by suggesting some measures to improve public confidence in statistics.
Report by Hugo Marur
Clustering in Statistical and Other Perspectives by Boris Mirkin
Before a large audience at Warwick University, Prof. Boris Mirkin presented four approaches to clustering: data mining, probabilistic statistical, machine learning and knowledge discovery. As a computer scientist, Prof. Mirkin has always seen data analysis as data mining. He has been working on clustering for more than 35 years and recently published a book entitled “Clustering for data mining: a data recovery approach” (2005, Chapman and Hall).
At the beginning of the talk, clustering was defined as the process to find homogeneous fragments in data for further analysis. According to the speaker, there was a strong belief in the 60’s that computer processors can always find patterns, and therefore every data set can be clustered with a meaningful goal. An example of nice clustering was shown by an update of the analysis by Jevons (1857) in which planets were clustered according to their distance to Sun. Including the since discovered Pluto suggested that Pluto is not a planet because it did not belong to any cluster – nowadays this is a popular view among astronomers. Another example containing mixed scale data demonstrated the importance of standardization in order to obtain more accurate clusters. It was also shown that the starting point in a clustering algorithm greatly affects the final result.
If we can always obtain clusters, then it is of extreme importance to define the reason to do so. Prof. Mirkin mentioned two types of goals that apply to clustering: engineering goals and data analysis goals. He focused on the second type, explaining the particular objectives for the four perspectives that were identified during the talk. The probabilistic statistical perspective aims to recover the distribution function. Machine learning aims for predictions. Data mining is engaged in revealing patterns in data. Finally, knowledge discovery is interested in enhancing knowledge with additional concepts and regularities.
After reviewing the above perspectives according to the importance that they gave to different aspects such as data collecting, data pre-processing, finding clusters, interpretation, and conclusions, the speaker concluded that data mining and knowledge discovery are areas where mathematicians and computational scientists can find interesting problems.
Report by Maria Vazquez