Hello all. Resources will be added here over the course of the term.
Roberts & Rosenthal General state space Markov chains and MCMC algorithms pdf
- 20newsgroups.rdata From Sam Roweis: 'A tiny version of the 20newsgroups data, with binary occurance data for 100 words across 16242 postings. I've also tagged the postings by the highest level domain in the array "newsgroups".'
- mnist.RData MNIST: Yann LeCun, Corinna Cortes, Christopher J.C. Burges 28x28 pictures of handwritten digits. 60,000 training samples and 10,000 test samples.
- mnist.small.RData A subset of the above: 10,000 training samples and 1,000 test samples, all taken from the training set of the real MNIST dataset.
image(matrix(train.X[11,],28,28)[,28:1],col=gray(0:255/255)) train.labels train.Y[11,] #One-hot encoding
- CIFAR-10 subsets: Frog-Horse-500 frog-horse.RData 100 from each class cifar10-small.RData
- Wisconsin Breast Cancer Diagnostic dataset. archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagnostic)
- pictures.Rdata Pictures of famous people. If you want to use your own jpeg you can
library(jpeg) # you may need to install.packages("jpeg")
img <- readJPEG("<your filename>.jpg")
- viruses.Rdata A dataset of 61 viruses used in Pattern Recognition and Neural Networks by B.D. Ripley (1996), Cambridge University Press. Available at http://www.stats.ox.ac.uk/pub/PRNN/ .