Homework 5: Image Classification, Part 1

As part of this homework, you will write a client program that classifies images. The goal of this homework is to familiarize you with programming in Java. You will practice:

Instructions

Task 1: Reading

Background

Classification is one of the central problems in machine learning (ML), a discipline that is transforming 21st century computing. As a familiar example, consider the problem of classifying handwritten digits using this web application:

Recognizing handwritten 4

Machine learning algorithms like this are widely used to classify handwritten digits (e.g., to recognize postal ZIP codes, process bank checks, and parse income tax forms).

The full power of machine learning derives from its amazing versatility. Machine learning algorithms rely upon data to learn to make predictions, without being explicitly programmed for the task. For example, the same code you will write to classify handwritten digits extends to classifying other types of images, simply by training the algorithm with different data.

Moreover, machine learning techniques apply not only to images but also to numerical, text, audio, and video data. Modern applications of ML span science, engineering, and commerce: from autonomous vehicles, medical diagnostics, and video surveillance to product recommendations, voice recognition, and language translation.

Approach

To classify images, we will use a supervised learning algorithm. Supervised learning is divided into two phases – training and testing.

Training

In the training phase, the algorithm learns a function that maps an input to an output (or a label) using training data consisting of known input-output pairs. For the handwritten digit application, the training data comprise 60,000 grayscale images (inputs) and associated digits (labels). Here is a small subset:

Training data

In a multiclass classification problem, we seek to classify images into one of \(m\) classes, labeled \(0, 1, \ldots, m - 1\). For our handwritten digit application, there are \(m = 10\) classes, with class \(i\) corresponding to digit \(i\).

Testing

In the testing phase, the algorithm uses the learned function to predict labels for unseen inputs.

Testing data

Typically, the algorithm makes some prediction errors (e.g., predicts 9 when the handwritten digit is 6). An important quality metric is the test error rate – the fraction of testing inputs that the algorithm misclassifies. It measures how well the learning algorithm generalizes from the training data to new data.

Task 2: Download Project Files

This is an individual assignment.

Start by downloading classify.zip, which contains the IntelliJ starter project. It is ~300MB, so it will take some time to download.

Task 3: Implement ImageClassifier.java

Overview

MultiPerceptron Data Type

This ML library is widely applicable: you can use it not only to classify images but also to classify numerical, text, and audio data. The main idea is to extract features from the data, representing each training and testing input as a vector \(x = x_0, x_1, \ldots, x_{n-1}\) of \(n\) real numbers. For our handwritten digit application, each input is a 28-by-28 grayscale image and the features are the \(n = 28 \times 28 = 784\) grayscale values that constitute the image.

Here is the MultiPerceptron API that you will use:

public class MultiPerceptron

    // Creates a multi-perceptron object with m classes
    // and size of the feature vector n.
    public MultiPerceptron(int m, int n)

    // Returns the number of classes m.
    public int numberOfClasses()
    
    // Returns the number of inputs n, the size of the feature vector
    public int numberOfInputs()
    
    // Returns the predicted label for the given input.
    public int predictMulti(double[] x)

    // Trains this multi-perceptron on the labeled input.
    public void trainMulti(double[] x, int label)

ImageClassifier.java client

Your task is to write a client program ImageClassifier.java that classifies images using the MultiPerceptron data type described in the previous section by:

Organize your client according to the following API:

public class ImageClassifier

    // Creates a feature vector (1D array) from the given picture.
    // See below for more details.
    public static double[] extractFeatures(Picture picture)

    // See below.
    public static void main(String[] args)
extractFeatures()

The extractFeatures() method converts a grayscale image into a one-dimensional array suitable for use with the MultiPerceptron data type. This method must perform two conversions:

  1. Convert the width-by-height image into a width-by-height matrix of grayscale values. Recall that a shade of gray has its red, green, and blue components all equal.
  2. Convert the width-by-height matrix of grayscale values into a one-dimensional array of length width \(\times\) height by iterating over the matrix elements in row-major order.
RGB matrix to grayscale matrix to row-major vector
main()

The main() method takes two command-line arguments:

  1. The name of a file that contains the training data.
  2. The name of a file that contains the testing data.

main() should:

  1. Create a MultiPerceptron object with m classes and n = width * height inputs.
  2. Train the MultiPerceptron object using the images and labels from the training data file.
  3. Use the trained MultiPerceptron object to classify the images from the testing data file, producing as output the following information:
    • A list of misclassified images, one per line.
    • The test error rate (the fraction of test images that the algorithm misclassified).

Here are some sample executions:

> java-introcs ImageClassifier digits-training20.txt digits-testing10.txt 
digits/testing/1/46.png
digits/testing/7/36.png
digits/testing/7/80.png
digits/testing/1/40.png
digits/testing/1/39.png
digits/testing/7/79.png
digits/testing/9/20.png
digits/testing/9/58.png
test error rate = 0.8

> java-introcs ImageClassifier digits-training60K.txt digits-testing10K.txt 
jar:file:digits.jar!/testing/5/9428.png
jar:file:digits.jar!/testing/6/4814.png
jar:file:digits.jar!/testing/5/4915.png
...
jar:file:digits.jar!/testing/5/7870.png
jar:file:digits.jar!/testing/4/1751.png
jar:file:digits.jar!/testing/5/6043.png
test error rate = 0.136

> java-introcs ImageClassifier fashion-training60K.txt fashion-testing10K.txt 
...

> java-introcs ImageClassifier Kuzushiji-training60K.txt Kuzushiji-testing10K.txt
...

> java-introcs ImageClassifier fruit-training30K.txt fruit-testing6K.txt
...

> java-introcs ImageClassifier animals-training60K.txt animals-testing12K.txt
...

> java-introcs ImageClassifier music-training50K.txt music-testing10K.txt
...

Note: Dont worry about the odd looking filenames. It’s just a verbose way to specify the location to a specific image file in a JAR (Java ARchive) file. Modern operating systems are not so adept at manipulating hundreds of thousands of individual image files, so this makes training more efficient. For example, consider jar:file:digits.jar!/testing/5/9428.png. In this case, jar:file:digits.jar identifies the JAR file digits.jar and /training/5/9428.png identifies a file named 9428.png, which is located in the subdirectory /training/5/ of the JAR file.

Input Files

A training data file consists of a sequence of lines:

The file format for testing data files is the same, but you will use the integer labels in the testing data files only to check the accuracy of your predictions.

Input file example

We provide a variety of sample input files in the specified format, including handwritten digits, fashion articles from Zalando, Hirigana characters, and doodles of fruit, animals, and musical instruments.

Data files Description Examples Source
digits.jar
digits-training60K.txt
digits-testing10K.txt
Handwritten digits Digits MNIST
fashion.jar
fashion-training60K.txt
fashion-testing10K.txt
Fashion articles Fashion articles Fashion MNIST
Kuzushiji.jar
Kuzushiji-training60K.txt
Kuzushiji-testing10K.txt
Hirigana characters Hirigana characters Kuzushiji MNIST
fruit.jar
fruit-training30K.txt
fruit-testing6K.txt
Fruit doodles Fruit doodles Quick, Draw!
animals.jar
animals-training60K.txt
animals-testing12K.txt
Animal doodles Animal doodles Quick, Draw!
music.jar
music-training50K.txt
music-testing10K.txt
Musical instrument doodles Musical instrument doodles Quick, Draw!

Hints

These are purely suggestions for how you might make progress. You do not have to follow these steps. If you get stumped or frustrated on some portion of the assignment, please contact the instructor or TAs for help.

Parsing Input Files
Feature Extraction
3 pixel by 3 pixel image to matrix to vector

Note: If you are using IntelliJ, do not type theimport java.awt.Color; statement that is normally needed to access Javas color data type. IntelliJ is pre-configured to automatically add import statements when needed (and remove them when not needed).

Classifying Images

Task 4: Analysis

Some people (especially in Europe and Latin America) write a 7 with a line through the middle, while others (especially in Japan and Korea) make the top line crooked:

Plain 7
7 with line
7 with crooked top

Suppose that the training data consists solely of samples that do not use any of these conventions. How well do you think the algorithm will perform when you test it on different populations? What are the possible consequences?

Now suppose that you are using a supervised learning algorithm to diagnose cancer. Suppose the training data consists of examples solely on individuals from population X but you use it on individuals from population Y. What are the possible consequences?

Provide your answers in your readme.txt file.

Submit

Upload the following files to Gradescope:

Acknowledgements

Thanks to Princeton’s COS126 materials, which served as a basis for this assignment.