Assignments need to be turned in via Lamaku
You must turn in a zip archive named ics312_hw6_xxx.zip, where “xxx” is your UH user name (e.g., for me, it would be ics312_hw6_henric.zip). The archive must contain a single top-level directory called ics312_hw6_xxx, where “xxx” is your UH user name (e.g., for me, it would be ics312_hw6_henric). In that directory you must have all the files named exactly as specified in the questions below.
Expected contents of the ics312_hw6_xxx directory:
As you know, massive amounts of data are increasingly being processed for many diverse goals. As a result, it’s crucial to not waste bits, so as to make sure data is small. The goal is not only to save on disk/RAM space, but also to save on data processing time. This is because loading data from RAM to the CPU is time consuming. For instance, say you have a boolean value to store. You can of course use a byte (in Java) or char (in C) type, but then you are wasting 7 bits. So if you have to store an array of 1024 such booleans, your data will be 1KiB. That’s 1KiB of data that will have to be brought from RAM to the CPU for processing. However, a boolean is by definition encodable on only 1 bit! So, really, storing 1,024 booleans should require only 1,024 bits, which is 128 bytes, which is 8 times less memory! So, in the real world, one often packs data into bits to get away from “the smallest data item is 1 byte” constraint.
It depends what you do. But if you do big data analytics you care about speed because the data is huge. Just to make this plain, I’ve implemented Question #3 in our assignment in C using the “array of booleans” approach (wasting bits) and the “array of bits” approach (not wasting bits but having to use bitwise operations). We’ll look at my code after the assignment’s due date. On my laptop, comparing the two approaches for processing large amounts of data gives the following timing results:
| Number of array elements | array of booleans | array of bits |
|---|---|---|
| 227 | 2.04 sec | 1.04 sec |
| 230 | 16.19 sec | 7.79 sec |
| 231 | 32.38 sec | 15.99 sec |
There is about a factor 2 speedup. In terms of asymptotic complexity analysis (ICS 311) this is only a constant factor, and so it “doesn’t matter”. But in practice, this constant factor is the difference between waiting 1 hour for results or 2 hours, which in many businesses can translate to a lot of money!
Let’s pretend you are working for a company that does big data analysis based on large human subject datasets (e.g., US Census data, surveys). From such datasets, boolean features of each person have been extracted that are relevant for a particular data analysis task. One then wants to ask question about the data. In this assignment, you’ll write some assembly to answer such questions.
Each data item in the dataset is a record with 4 booleans:
You will implement a program, adding features incrementally, that processes data about 8 people, i.e., 8*4 = 32 bits. These 32 bits are provided as keyboard input as an integer. In a real-world scenario, one would of course get the input from a (huge) binary data file.
segment .data
msg1 db "Enter an integer: ", 0
msg2 db "Binary representation: ", 0
msg3 db "Semantic: ", 0
msg4 db "YNGR|IPHN|HULU|SWRS count: ", 0
msg5 db "*|ANDR|*|STRK count: ", 0
codes db "YNGR", 0, "OLDR", 0, "ANDR", 0, "IPHN", 0, "NFLX", 0, "HULU", 0, "SWRS", 0, "STRK", 0
four_spaces db " ",0segment .bss
input resd 1 ; only 4 bytes! (but you are allowed to go up to 8 bytes if you want)
Write an assembly program called hw6_ex1, with source stored in file hw6_ex1.asm, which prompts the user for an integer, and prints out its 32-bit binary representation with a white space every 4 digits. You can assume that the user always enters a valid 32-bit integer.
Here are example interactions with the program, which you should match:
% ./hw6_ex1
Enter an integer: 15
Binary representation: 0000 0000 0000 0000 0000 0000 0000 1111
% ./hw6_ex1
Enter an integer: -3124
Binary representation: 1111 1111 1111 1111 1111 0011 1100 1100
% ./hw6_ex1
Enter an integer: -1
Binary representation: 1111 1111 1111 1111 1111 1111 1111 1111
% ./hw6_ex1
Enter an integer: 512331
Binary representation: 0000 0000 0000 0111 1101 0001 0100 1011 Augment your program so that it also prints the semantics of the input integer, i.e., its meaning in terms of describing 8 persons in our dataset. The program should print each of the four boolean values for each person as a 4-character string using the following mapping:
| boolean | 0 | 1 |
|---|---|---|
| b1 | YNGR | OLDR |
| b2 | ANDR | IPHN |
| b3 | NFLX | HULU |
| b4 | SWRS | STRK |
The program should print the “decoding” of each person on a line, starting with 4 white spaces. Here are example interactions with the program, which you should match:
% ./hw6_ex1
Enter an integer: 512331
Binary representation: 0000 0000 0000 0111 1101 0001 0100 1011
Semantic:
YNGR ANDR NFLX SWRS
YNGR ANDR NFLX SWRS
YNGR ANDR NFLX SWRS
YNGR IPHN HULU STRK
OLDR IPHN NFLX STRK
YNGR ANDR NFLX STRK
YNGR IPHN NFLX SWRS
OLDR ANDR HULU STRK
% ./hw6_ex1
Enter an integer: 257571913
Binary representation: 0000 1111 0101 1010 0011 1100 0100 1001
Semantic:
YNGR ANDR NFLX SWRS
OLDR IPHN HULU STRK
YNGR IPHN NFLX STRK
OLDR ANDR HULU SWRS
YNGR ANDR HULU STRK
OLDR IPHN NFLX SWRS
YNGR IPHN NFLX SWRS
OLDR ANDR NFLX STRK
% ./hw6_ex1
Enter an integer: -1
Binary representation: 1111 1111 1111 1111 1111 1111 1111 1111
Semantic:
OLDR IPHN HULU STRK
OLDR IPHN HULU STRK
OLDR IPHN HULU STRK
OLDR IPHN HULU STRK
OLDR IPHN HULU STRK
OLDR IPHN HULU STRK
OLDR IPHN HULU STRK
OLDR IPHN HULU STRK Augment your program so that it prints how many persons, as encoded in the input integer, are younger, have an iPhone, prefer Hulu, and prefer Star Wars. As a reference, my own solution to this question is a total of 15 lines of assembly, including all printing.
Here are example interactions with the program, which you should match:
% ./hw6_ex1
Enter an integer: 456288
Binary representation: 0000 0000 0000 0110 1111 0110 0110 0000
Semantic:
YNGR ANDR NFLX SWRS
YNGR ANDR NFLX SWRS
YNGR ANDR NFLX SWRS
YNGR IPHN HULU SWRS
OLDR IPHN HULU STRK
YNGR IPHN HULU SWRS
YNGR IPHN HULU SWRS
YNGR ANDR NFLX SWRS
YNGR|IPHN|HULU|SWRS count: 3
% ./hw6_ex1
Enter an integer: 258241231
Binary representation: 0000 1111 0110 0100 0111 0010 1100 1111
Semantic:
YNGR ANDR NFLX SWRS
OLDR IPHN HULU STRK
YNGR IPHN HULU SWRS
YNGR IPHN NFLX SWRS
YNGR IPHN HULU STRK
YNGR ANDR HULU SWRS
OLDR IPHN NFLX SWRS
OLDR IPHN HULU STRK
YNGR|IPHN|HULU|SWRS count: 1Augment your program so that it also prints the count of persons that have an Android Phone and prefer Star Trek (we don’t care about their age or their streaming service). Here again, my own solution to this question is a total of 15 lines of assembly, including all printing.
Here are example interactions with the program, which you should match:
% ./hw6_ex1
Enter an integer: -413551222
Binary representation: 1110 0111 0101 1001 1011 0101 1000 1010
Semantic:
OLDR IPHN HULU SWRS
YNGR IPHN HULU STRK
YNGR IPHN NFLX STRK
OLDR ANDR NFLX STRK
OLDR ANDR HULU STRK
YNGR IPHN NFLX STRK
OLDR ANDR NFLX SWRS
OLDR ANDR HULU SWRS
YNGR|IPHN|HULU|SWRS count: 0
*|ANDR|*|STRK count: 2
% ./hw6_ex1
Enter an integer: 6403693
Binary representation: 0000 0000 0110 0001 1011 0110 0110 1101
Semantic:
YNGR ANDR NFLX SWRS
YNGR ANDR NFLX SWRS
YNGR IPHN HULU SWRS
YNGR ANDR NFLX STRK
OLDR ANDR HULU STRK
YNGR IPHN HULU SWRS
YNGR IPHN HULU SWRS
OLDR IPHN NFLX STRK
YNGR|IPHN|HULU|SWRS count: 3
*|ANDR|*|STRK count: 2