ICS 432 Fall 2024 | Homework Assignment #11

Homework Assignment #11 [45 points + 20 points extra credit]

Overview

In this assignment you will:

Include three filters implemented in C in our image processing app using Docker
Make each filter data parallel using OpenMP

In this assignment you have to use GitHub Issues as follows:

When a question in the assignment asks you to create a specific issue, do so in your repo
For each commit that directly pertains to that issue, reference the issue number (e.g., #132) in the git commit message. (A single commit could reference multiple issues, but that should be relatively rare.)
- If you forgot you can amend a previous commit message.
You are welcome and encouraged to create your own issues as well.
You can close an issue whenever you are confident that you have addressed it.

How/What to turn in

After creating a release named 1.6 in the Github repo, attach the code snapshot for release 1.6, as downloaded from GitHub. This the code you will be graded on.

Preliminaries

In this assignment you will do C development using OpenMP using starter code provided to you in the external_filters directory of the img432app repo.
This directory contains the code for three C programs that can be invoked from the command-line to apply image filters to jpeg files. The objective is to have the Java app to call these programs via Docker, and then to modify these programs to make them data-parallel.

Question #1 [15 pts]: Adding three new filters implemented in C to our Java app

In this question we make it possible for our Java app to invoke the three filter programs (which, for now, are sequential) installed in the Docker container. Create and address the following GitHub issue:

Issue Title: External C filters

Issue Label: enhancement

Issue Description: Add three filters to the app, called DPEdge, DPFunk1, and DPFunk2. The DPEdge filter invokes the jpegedge program in the Docker container, the DPFunk1 filter invokes the jpegfunk1 program in the Docker container, and the DPFunk2 filter invokes the jpegfunk2 program in the Docker container.

Follow the instructions in this README.md file (which is in ics432imgapp/external_filters/README.md but better viewed in the browser) to get started with Docker.

Notes:

jpegfunk1 and jpegfunk2 right now do the exact same thing and are in fact the same code. In questions below we will parallelize them in different ways. Also, they take a while!
We will not test your code for any kind of error-handling. Just make sure it works in the cases in which everything is ok (i.e., all files are there, output directory is writable, etc).

Hints:

Since the external filters are C programs that directly take in and produce output files, it will not be possible to overlap I/O and computation. So it’s probably a very good idea to extend your WorkUnit class into, say, a WorkUnitExternal class in which the “read input” and “write output” methods do nothing, and the “process” method does it all, and by “all” we mean “invoke docker with the right command-line arguments”.
Invoking an external process in Java is pretty easy. Say you want your Java program to invoke the following command ls -la /tmp. then you would do it this way:

    List<String> args = new ArrayList<>();
    args.add("ls");
    args.add("-la");
    args.add("/tmp");

    ProcessBuilder pb = new ProcessBuilder(args);
    try {
        Process p = pb.inheritIO().start(); // The inheritIO() is important!
        int status = p.waitFor();
        if (status != 0) {
            // Ok to just abort if some error
            System.err.println("Processbuilder-created process failed! [FATAL]");
            System.exit(0);
        }
    } catch (InterruptedException ignore) {
    } catch (IOException e) {
        // Ok to just abort if some error
        System.err.println("Processbuilder-created process failed! [FATAL]");
        System.exit(0);    
    }

Question #2 [30 pts]: Parallelizing an Edge-Detection Filter

Create and address the following GitHub issue:

Issue Title: Data-parallel edge detection filter

Issue Label: enhancement

Issue Description: Make the jpegedge program data-parallel (and still callable from the Java app)

The jpegedge program (installed in the Docker container and with source code in c_filters/src/jpegedge.c) implements the Sobel filter, which is used for edge detection (so as to identify individual objects in a scene). It takes two command-line arguments:

  jpegedge ./image.jpg /tmp/edge_image.jpg

will apply the edge detection filter to an image in file ./image.jpg and save the filtered image to file /tmp/Edge_image.jpg.

For testing in this question we will use the large (gray scale) image on the left below (the filtered image is also shown on the right). Click on the image or the link below it to get the full size test image, which you then should download.


Test temple.jpg image	Filtered image

Note: You need to re-build the docker container each time you modify the C code for testing using the container (and for use in the Java app). You can of course test the filters without the Java app by “logging in” to the Docker container and doing everything command-line.

Todo: Have the jpegedge program take a required 3rd command-line argument that specifies how many threads should be used to run the filter in data-parallel fashion. The number of threads passed to the program by the Java app is based on the value of the corresponding slider in the main window. The program should exit gracefully if the (number of or values of) command-line arguments are incorrect.

Todo: Using OpenMP, enhance the program to make the filter data-parallel, using the specified number of threads in the 3rd command-line argument. Do this by just adding a simple #pragma omp parallel and a #pragma omp for before the first loop of the three nested loops that go through the pixels to process (i.e., the outer loop). Since all pixel computations are identical, there is no need to experiment with any fancy loop scheduling options, etc.

Todo: In your README file report on filter execution times when using 1, 2, and 4 threads on your machine. For 2 and 4 threads, give the speedup and the parallel efficiency. If your machine has more than 4 cores, then feel free to report on numbers for more threads.

Question #3 [10 pts - EXTRA CREDIT]: Naively Parallelizing a Funky Filter

Create and address the following GitHub issue:

Issue Title: Data-parallel funky filter (naive)

Issue Label: enhancement

Issue Description: Make the jpegfunk1 program data-parallel using a naive parallelization strategy

Your developer friends has been working with an artist who’s exploring strange image filters to automatically produce images (insert here philosophical discussion of the definition of what art really is and of the place of automation in art). Your friend has implemented one of the artist’s filters, which is called “Funk”. This filter is quite computationally expensive, and some pixels are much more expensive to compute than others. Your friend doesn’t know OpenMP and has asked you to make the filter data-parallel.

The use of jpegfunk1 program (installed in the container and with source code in c_filters/src/jpegfunk1) is used exactly like the jpegedge program in the previous question.

Testing in this question we will use the image on the left below (the filtered image is also shown on the right). Click on the image or the link below it to get the full size test image, which you then should download. (Applying the filter to that image can take several minutes.)


Test humu.jpg image	Filtered image

Todo: Have the program take a required 3rd command-line argument that specifies how many threads should be used to run the filter in a data-parallel fashion.

Todo: Using OpenMP, enhance the program so that the filter is executed in a data-parallel fashion using the specified number of threads in the 3rd command-line argument. Do this by just adding a simple #pragma omp parallel and a #pragma omp for before the first loop of the three nested loops that go through the pixels to process (i.e., the outer loop), exactly as you did in Question #1.

Todo: Have each thread print to the terminal how much time it spent in the for loop.

Todo: In your README file:

Report filter execution times when using 1, 2, and 4 threads. For 2 and 4 threads, give the speedup and the parallel efficiency.
Say whether the results are better or worse than in Question #2 for the Edge filter.
When using 4 threads, report on the time each thread spends in the loop. Say whether you would say that the execution is well load-balanced

Hints:

For having each thread prints the time spent in the for loop, using the nowait option is key (see lecture notes)
A portable, not very accurate but sufficient for out purpose, way to measure time in a C program is to use gettimeofday. Here is a fragment of code that showcases its use:

#include <stdio.h>
#include <sys/time.h>
...
struct timeval start, end;
gettimeofday(&start, NULL);
...
gettimeofday(&end, NULL);
printf("elapsed: %.2lf\n",
       ((1000000.0 * (end.tv_sec - start.tv_sec) +
       (1.0 * (end.tv_usec - start.tv_usec)))/1000000.0));
...

Question #4 [10 pts - EXTRA CREDIT]: Cleverly Parallelizing the Funky Filter

Create and address the following GitHub issue:

Issue Title: Data-parallel funky filter (clever)

Issue Label: enhancement

Issue Description: Make the jpegfunk2 program data-parallel using a “clever” parallelization strategy

The load-imbalance problem in the previous question is severe, and kills parallel efficiency. Come up with a better use of OpenMP than in Question #3. Specifically, when using 4 threads, your implementation should have all threads compute for the same amount of time as much as possible.

Hints:

This can be done by re-organizing the pixel computation a bit, or by using OpenMP’s loop collapsing feature
The code could be almost the same as the in the previous question

Todo: In your README file, include the following information:

Report filter execution times when using 1, 2, and 4 threads. For 2 and 4 threads, give the speedup and the parallel efficiency.
When using 4 threads, report on the time each thread spends in the loop.
Compare these results to those in Question #3.
Explain how you used OpenMP to improve the loop parallelization.