Homework Assignment #6 [0 pts]


How to turn in?

Assignments need to be turned in via Lamaku

What to turn in?

You must turn in a zip archive named ics312_hw6_xxx.zip, where “xxx” is your UH user name (e.g., for me, it would be ics312_hw6_henric.zip). The archive must contain a single top-level directory called ics312_hw6_xxx, where “xxx” is your UH user name (e.g., for me, it would be ics312_hw6_henric). In that directory you must have all the files named exactly as specified in the questions below.

Expected contents of the ics312_hw6_xxx directory:


Context and Motivation

As you know, massive amounts of data are increasingly being processed for many diverse goals. As a result, it’s crucial to not waste bits, so as to make sure data is small. The goal is not only to save on disk/RAM space, but also to save on data processing time. This is because loading data from RAM to the CPU is time consuming. For instance, say you have a boolean value to store. You can of course use a byte (in Java) or char (in C) type, but then you are wasting 7 bits. So if you have to store an array of 1024 such booleans, your data will be 1KiB. That’s 1KiB of data that will have to be brought from RAM to the CPU for processing. However, a boolean is by definition encodable on only 1 bit! So, really, storing 1,024 booleans should require only 1,024 bits, which is 128 bytes, which is 8 times less memory! So, in the real world, one often packs data into bits to get away from “the smallest data item is 1 byte” constraint.

Does it all really matter?

It depends what you do. But if you do big data analytics you care about speed because the data is huge. Just to make this plain, I’ve implemented Question #3 in our assignment in C using the “array of booleans” approach (wasting bits) and the “array of bits” approach (not wasting bits but having to use bitwise operations). We’ll look at my code after the assignment’s due date. On my laptop, comparing the two approaches for processing large amounts of data gives the following timing results:

Number of array elements array of booleansarray of bits
227 2.04 sec1.04 sec
230 16.19 sec7.79 sec
231 32.38 sec15.99 sec

There is about a factor 2 speedup. In terms of asymptotic complexity analysis (ICS 311) this is only a constant factor, and so it “doesn’t matter”. But in practice, this constant factor is the difference between waiting 1 hour for results or 2 hours, which in many businesses can translate to a lot of money!


Exercise #1: Bitwise operations for Big Data [0 pts]

Overview

Let’s pretend you are working for a company that does big data analysis based on large human subject datasets (e.g., US Census data, surveys). From such datasets, boolean features of each person have been extracted that are relevant for a particular data analysis task. One then wants to ask question about the data. In this assignment, you’ll write some assembly to answer such questions.

Each data item in the dataset is a record with 4 booleans:

You will implement a program, adding features incrementally, that processes data about 8 people, i.e., 8*4 = 32 bits. These 32 bits are provided as keyboard input as an integer. In a real-world scenario, one would of course get the input from a (huge) binary data file.

Implementation constraints

segment .data
    msg1    db      "Enter an integer: ", 0
    msg2    db      "Binary representation: ", 0
    msg3    db      "Semantic: ", 0
    msg4    db      "YNGR|IPHN|HULU|SWRS count: ", 0
    msg5    db      "*|ANDR|*|STRK count: ", 0
    codes   db      "YNGR", 0, "OLDR", 0, "ANDR", 0, "IPHN", 0, "NFLX", 0, "HULU", 0, "SWRS", 0, "STRK", 0
    four_spaces db  "    ",0
segment .bss
		input	resd  1     ; only 4 bytes! 

(but you are allowed to go up to 8 bytes if you want)


Question #1: Printing binary representation of input integer [0 pts]

Write an assembly program called hw6_ex1, with source stored in file hw6_ex1.asm, which prompts the user for an integer, and prints out its 32-bit binary representation with a white space every 4 digits. You can assume that the user always enters a valid 32-bit integer.

Here are example interactions with the program, which you should match:

% ./hw6_ex1 
Enter an integer: 15
Binary representation: 0000 0000 0000 0000 0000 0000 0000 1111 
% ./hw6_ex1 
Enter an integer: -3124
Binary representation: 1111 1111 1111 1111 1111 0011 1100 1100 
% ./hw6_ex1 
Enter an integer: -1
Binary representation: 1111 1111 1111 1111 1111 1111 1111 1111 
% ./hw6_ex1
Enter an integer: 512331
Binary representation: 0000 0000 0000 0111 1101 0001 0100 1011 

Question #2: Printing semantics of input integer [0 pts]

Augment your program so that it also prints the semantics of the input integer, i.e., its meaning in terms of describing 8 persons in our dataset. The program should print each of the four boolean values for each person as a 4-character string using the following mapping:

boolean 0 1
b1 YNGR OLDR
b2 ANDR IPHN
b3 NFLX HULU
b4 SWRS STRK


The program should print the “decoding” of each person on a line, starting with 4 white spaces. Here are example interactions with the program, which you should match:

% ./hw6_ex1
Enter an integer: 512331
Binary representation: 0000 0000 0000 0111 1101 0001 0100 1011 
Semantic: 
    YNGR ANDR NFLX SWRS 
    YNGR ANDR NFLX SWRS 
    YNGR ANDR NFLX SWRS 
    YNGR IPHN HULU STRK 
    OLDR IPHN NFLX STRK 
    YNGR ANDR NFLX STRK 
    YNGR IPHN NFLX SWRS 
    OLDR ANDR HULU STRK 
% ./hw6_ex1 
Enter an integer: 257571913
Binary representation: 0000 1111 0101 1010 0011 1100 0100 1001 
Semantic: 
    YNGR ANDR NFLX SWRS 
    OLDR IPHN HULU STRK 
    YNGR IPHN NFLX STRK 
    OLDR ANDR HULU SWRS 
    YNGR ANDR HULU STRK 
    OLDR IPHN NFLX SWRS 
    YNGR IPHN NFLX SWRS 
    OLDR ANDR NFLX STRK 
% ./hw6_ex1 
Enter an integer: -1
Binary representation: 1111 1111 1111 1111 1111 1111 1111 1111 
Semantic: 
    OLDR IPHN HULU STRK 
    OLDR IPHN HULU STRK 
    OLDR IPHN HULU STRK 
    OLDR IPHN HULU STRK 
    OLDR IPHN HULU STRK 
    OLDR IPHN HULU STRK 
    OLDR IPHN HULU STRK 
    OLDR IPHN HULU STRK 

Question #3: Counting particular persons [0 pts]

Augment your program so that it prints how many persons, as encoded in the input integer, are younger, have an iPhone, prefer Hulu, and prefer Star Wars. As a reference, my own solution to this question is a total of 15 lines of assembly, including all printing.

Here are example interactions with the program, which you should match:

% ./hw6_ex1 
Enter an integer: 456288
Binary representation: 0000 0000 0000 0110 1111 0110 0110 0000 
Semantic: 
    YNGR ANDR NFLX SWRS 
    YNGR ANDR NFLX SWRS 
    YNGR ANDR NFLX SWRS 
    YNGR IPHN HULU SWRS 
    OLDR IPHN HULU STRK 
    YNGR IPHN HULU SWRS 
    YNGR IPHN HULU SWRS 
    YNGR ANDR NFLX SWRS 
YNGR|IPHN|HULU|SWRS count: 3
% ./hw6_ex1 
Enter an integer: 258241231
Binary representation: 0000 1111 0110 0100 0111 0010 1100 1111 
Semantic: 
    YNGR ANDR NFLX SWRS 
    OLDR IPHN HULU STRK 
    YNGR IPHN HULU SWRS 
    YNGR IPHN NFLX SWRS 
    YNGR IPHN HULU STRK 
    YNGR ANDR HULU SWRS 
    OLDR IPHN NFLX SWRS 
    OLDR IPHN HULU STRK 
YNGR|IPHN|HULU|SWRS count: 1

Question #4: Counting less specific persons [0 pts]

Augment your program so that it also prints the count of persons that have an Android Phone and prefer Star Trek (we don’t care about their age or their streaming service). Here again, my own solution to this question is a total of 15 lines of assembly, including all printing.

Here are example interactions with the program, which you should match:

% ./hw6_ex1
Enter an integer: -413551222
Binary representation: 1110 0111 0101 1001 1011 0101 1000 1010 
Semantic: 
    OLDR IPHN HULU SWRS 
    YNGR IPHN HULU STRK 
    YNGR IPHN NFLX STRK 
    OLDR ANDR NFLX STRK 
    OLDR ANDR HULU STRK 
    YNGR IPHN NFLX STRK 
    OLDR ANDR NFLX SWRS 
    OLDR ANDR HULU SWRS 
YNGR|IPHN|HULU|SWRS count: 0
*|ANDR|*|STRK count: 2
% ./hw6_ex1
Enter an integer: 6403693
Binary representation: 0000 0000 0110 0001 1011 0110 0110 1101 
Semantic: 
    YNGR ANDR NFLX SWRS 
    YNGR ANDR NFLX SWRS 
    YNGR IPHN HULU SWRS 
    YNGR ANDR NFLX STRK 
    OLDR ANDR HULU STRK 
    YNGR IPHN HULU SWRS 
    YNGR IPHN HULU SWRS 
    OLDR IPHN NFLX STRK 
YNGR|IPHN|HULU|SWRS count: 3
*|ANDR|*|STRK count: 2