| ECE291 |
Computer Engineering II |
Lockwood, Fall 1997 |
Machine Problem 2: Data Compression
| Due Date | Friday 10/3/97 |
| Purpose |
Math, User I/O, Subroutines
|
| Points | 50 |
Introduction
Compression algorithms reduce the size of data by
eliminating redundancy. These algorithms often allow the
information content of a mesage, image, or picture to be
preserved using a fewer number of bits.
Run Lenth Encoding (RLE) is one such compression algorithm that
preserves the exact content of the original information.
Many every-day devices use compression. Modems encode data reduncies
to increase thoughput. Fax machines encode blank areas
to reduce facsimile transmission time.
Graphic programs save disk space by encoding the redundancy in photos.
A run length encoder looks for strings of identical symbols.
When found, the encoder transmits the data element once followed by
a special repeat (REP) symbol to indicate redunancy, and a count that
indicates how many times the symbol should be repeated.
In this MP, we will encode and decode English text messages.
The English alphabit contains 26 letters (A..Z).
Five bits of data can be used to uniquely represent 2^5=32 symbols.
This is sufficient information to encode all of the letters
and still provide a few extra symbols for characters such as
the space and asterisk (*).
Encoding Rules
- We will encode characters with the following symbols:
Character (ASCII) | Symbol (Binary) |
| _ (space) | 00000 |
| A | 00001 |
| B | 00010 |
| C | 00011 |
| ... | ... |
| Z | 11010 |
| (undefined) | 11011 |
| * (asterisk) | 11100 |
| (undefined) | 11101 |
| (undefined) | 11110 |
| REP (repeat) | 11111 |
- Repeating characters should be encoded as:
Where
- Count is a five-bit number that
refers to the number of characters that repeat in
addition to the character itself.
- Due to the size of Count, runs are limited to 32 symbols.
- To prevent encoding data from becoming larger than the
original data, only runs of three or more characters should be
run length encoded.
User Interface
- You are given the framework of a program which provides
a menu-driven interface.
- By selecting an option from the menu, the user can:
- Enter text message or binary data
- Print the contents of the message or buffer
- Encode a message or decode the buffer
- All binary data is entered and displayed from left-to-right with
the Least Significant Bit (LSB) first.
- Your input routines must be robust to all types
of user input including backspace, invalid input, buffer underflow,
and overflow.
Sample Input & Output
- An understanding of the program is best obtained by
running the program interactively.
- If you haven't alrady done so, go to the ECE291 lab and
run the program.
- Alternatively, Download and decompress mp2.zip
- The results from a sample run are shown below:
------------- MP2 Menu --------------
Enter (T)ext / (B)inary
Print (M)essage / (R)buffeR
RLE (E)ncode / (D)ecode
------ [ESC] or (Q)uit to exit ------
t
Enter Text Message:
ABCD
e
Compressed Size = 20 bits ~= 2 bytes. (LSBit .. MSBit)
10000 01000 11000 00100
t
Enter Text Message:
****
e
Compressed Size = 15 bits ~= 1 bytes. (LSBit .. MSBit)
00111 11111 11000
b
Enter Binary Data (LSBit .. MSBit):
00010 10100 00110 11111 11000 11110 00000 11101 11110 01001 00110 00100
d
Text Message=
HELLLLO WORLD |
- Three sets of sample input data are included with this MP:
test1.in, test2.in, and test3.in
- The corresponding outputs for the same data are:
test1.out, test2.out, and test3.out.
- Your program can read input from a file with the following command:
MP2 < testx.in
Data Structures
- A few variables have already been defined for you in the program
framework.
- TextMsg: String of bytes that holds an ASCII message,
terminated with the '$' end-of-string marker.
- Buffer: A packed array of bytes that holds the
encoded data. Groups of 5 bits are stored in adjacent bit
locations. You'll need to find a way to insert and extract
five bits to and from this array at a time.
- BufferLength: A word which stores the length (in bits) of
the variable buffer.
- and a few constants have also been defined:
- BufferMaxLength == 35 bytes
- BufferMaxLengthBits == 8*BufferMaxLength bits
- TextMsgMaxLength == 56 bytes
Procedures
- This assignment has eight procedures.
You will receive credit for this
assignment by replacing each of the eight
procedures listed below with your own code.
- You need to experiment with the working code
to gain a full understanding of how the programs works,
what the procedures do, and how the procedures interact with each other.
- Your program should exactly match the functionality of the library
subroutines.
- All subroutines should be modular. They should use the stack to
preserve the value of any registers they may modify.
- PrintTextMsg
- Purpose: Prints the contents of the TextMsg
variable.
- Inputs: TextMsg variable
- Outputs: Writes to screen
- Points: 1
- ReadTextMsg
- Purpose: Read TextMsg from the keyboard.
- Converts lowercase letters to uppercase letters
- Rejects all invalid input (and beeps)
- Allows backspacing (BS = ASCII 8)
- Prevents Underflow and Overflow of variable
- Terminates with a carriage return (CR = ASCII 13)
- Rejects line feeds (LF = ASCII 10)
- Marks end of TextMsg with the '$'.
- Inputs: Keyboard
- Outputs: TextMsg variable
- Hint: To erase a character from the screen, print
a backspace, then whitespace, then backspace.
- Points: 7
- PrintBuffer
- Purpose: Prints the size contents of the binary
Buffer array
- Prints 5 bits at a time.
- Prints least significant bit of each symbol first.
- Inputs: Buffer & BufferLength variables
- Outputs: Writes to screen
- Note: This routine is deceptively tricky due to the
fact that symbols are packed into groups of five 5 bits rather
than the more convienient byte-sized grouping.
- Hint: Review your shifting techniques and modulo arithmetic!
- Points: 7
- Encode
- Purpose: Encodes a single ASCII character into a 5-bit symbol
as defined above.
- Input: AL = ASCII character
- Output: DL = 5-bit Symbol
- Points: 3
- AppendBuffer
- Purpose: Appends a 5-bit symbol to the end of the encoded array.
- Input: AL = Symbol to append
- Output: Appends 5 bits to Buffer then
adds 5 to BufferLength
- Notes: Append the bits, not the byte!
- Points: 5
- ReadBuffer
- Purpose: Read binary Buffer from the keyboard.
- Accepts only 0's and 1's
- Ignores spaces and Rejects all other invalid input (and beeps)
- Allows backspacing
- Prevents Underflow and Overflow of variable
- Terminates with a carriage return (CR = ASCII 13)
- Rejects line feeds (LF = ASCII 10)
- Sets BufferLength = number of bits read
- Inputs: Keyboard
- Outputs: Buffer and BufferLength variable
- Points: 7
- DecodeRLE
- Purpose: Decodes Buffer to TextBuf as described above
- Inputs: Buffer & BufferLength Variables
- Output: TextMsg
- Points: 10
- EncodeRLE
- Purpose: Encodes TextBuf to Buffer as described above
- Input: TextMsg
- Output: Buffer & BufferLength Variables
- Points: 10
Preliminary Procedure
- Copy the
empty MP2 program (MP2.ASM),
sample input files
(test1.in, test2.in, test3.in),
corresponding output files
(test1.out, test2.out, test3.out),
libraries (libmp2.lib, lib291.lib), and
Makefile from the network drive to your home directory
with the following command:
xcopy /s I:\ece291\mp2 F:\mp2
Alternatively, from home, you can download the same files as
mp2.zip.
- As with MP0 and MP1, run NMake to build an executable program
using the given ECE291 library functions.
- As with MP0 and MP1, use a text editor to modify the program.
As given, the program uses LIBMP2 routines
to implement all
functionality. To receive full credit for the assignment,
you will need to implement each of the subroutines described above with
your own code.
- As with MP0 and MP1, use CodeView (CV) to debug and test your
program.
Because you only receive credit for procedures that function completely
as specified, it is best to debug each routine individually.
- By modifying a few comments, you can mix and match usage of your
own code and Library routines. You may notice that the LIBMP2
routines contain extraneous and difficult-to-read code. They
are not meant to be unassembled!
Final Steps
- Demonstrate MP2.EXE to a TA or to the instructor.
You will
then be asked to recompile and demonstrate
MP2 with different input files.
Your program must work with all given input.
Once approved, you are ready to turn in your program.
- Be prepared to answer questions about any aspect of the operation of your
program. The TAs will not accept an MP if you cannot fully
explain the operation of your code.
- Copy your programs to handin floppy:
A:\Handin YourWindowsLogin
- Print MP2.ASM
- Take your printout and disk with MP1 to the same TA which approved your
demonstration. Be sure that your name is on the disk and on the printout.
MP2.ASM (Program framework)
PAGE 75, 132
TITLE ECE291:MP2:MP2-Compress - Your Name - Date
COMMENT *
Data Compression.
The world contains a great deal of data. Luckily, a great
deal of it is redundant (i.e., repeats itself or has repeating
patterns). Using compression algorithms, one can encode such
data using a smaller number of bits.
For this MP, you will write a program which uses Run-Length
Encoding (RLE) to compress textual data. As you will see, RLE
is most effective on data which has long runs of identical characters.
ECE291: Machine Problem 2
Prof. John W. Lockwood
Dept. of Electrical & Computer Engineering
Unversity of Illinois
Fall 1997
Ver 1.0
*
;====== Constants =========================================================
BEEP EQU 7
BS EQU 8
CR EQU 13
LF EQU 10
ESCKEY EQU 27
SPACE EQU 32
BufferMaxLength EQU 35 ; Bytes
BufferMaxLengthBits EQU BufferMaxLength * 8 ; Bits
TextMsgMaxLength EQU 56 ; Bytes
;====== Externals =========================================================
; -- LIB291 Routines (Free) ---
extrn kbdine:near, kbdin:near, dspout:near ; LIB291 Routines
extrn dspmsg:near, binasc:near, ascbin:near ; (Always Free)
extrn mp2xit:near ; Exit program with a call to this procedure
; -- LIBMP2 Routines (Replace these with your own code) ---
extrn PrintBuffer:near ; Print contents of Buffer
extrn ReadBuffer:near ; Read Buffer from keyboard
extrn ReadTextMsg:near ; Read TextMsg from keyboard
extrn PrintTextMsg:near ; Print contents of TxtMsg
extrn Encode:near ; Encode ASCII -> 5-bit
extrn AppendBuffer:near ; Add a character to Buffer
extrn EncodeRLE:near ; Run Length Encode TextMsg -> Buffer
extrn DecodeRLE:near ; Run Length Decode Buffer -> TextMsg
;====== SECTION 3: Define stack segment ===================================
stkseg segment stack ; *** STACK SEGMENT ***
db 64 dup ('STACK ') ; 64*8 = 512 Bytes of Stack
stkseg ends
;====== SECTION 4: Define code segment ====================================
cseg segment public 'CODE' ; *** CODE SEGMENT ***
assume cs:cseg, ds:cseg, ss:stkseg, es:nothing
;====== SECTION 5: Variables ==============================================
Buffer db BufferMaxLength dup(0) ; Data Buffer for encoded Message
TextMsg db TextMsgMaxLength dup('$'), '$' ; Text Message
BufferLength dw 0 ; Number of bits in buffer
crlf db CR,LF,'$' ; DOS uses carriage return + Linefeed for new line
PUBLIC Buffer, TextMsg, BufferLength
;====== Procedures ========================================================
; Your Subroutines go here !
; ---- ----------- -- ----
;====== Main procedure ====================================================
MenuMessage db CR,LF, \
'------------- MP2 Menu --------------',CR,LF,\
' Enter (T)ext / (B)inary',CR,LF, \
' Print (M)essage / (R)buffeR',CR,LF, \
' RLE (E)ncode / (D)ecode',CR,LF, \
'------ [ESC] or (Q)uit to exit ------',CR,LF,'$'
main proc far
mov ax, cseg
mov ds, ax
MOV DX, Offset MenuMessage
CALL DSPMSG ; Display Menu
MainLoop: MOV DX, Offset CRLF
CALL DSPMSG
MainRead: CALL KBDIN ; Read Input
CMP AL,'a'
JB MainOpt
CMP AL,'z' ; Convert Lowercase to Uppercase
JA MainOpt
SUB AL,'a'-'A'
MainOpt: CMP AL,'T'
JNE MainNotT
Call ReadTextMsg ; Read in a text message
JMP MainLoop
MainNotT: CMP AL,'B'
JNE MainNotB
Call ReadBuffer ; Read in a binary message
JMP MainLoop
MainNotB: CMP AL,'M'
JNE MainNotM
Call PrintTextMsg ; Print TextMsg
JMP MainLoop
MainNotM: CMP AL,'R'
JNE MainNotR ; Print Buffer
Call PrintBuffer ; (show least significants bit first)
JMP MainLoop
MainNotR: CMP AL,'E'
JNE MainNotE
Call EncodeRLE ; Run Length Encode Message
Call PrintBuffer ; and print result
JMP MainLoop
MainNotE: CMP AL,'D'
JNE MainNotD
Call DecodeRLE ; Run Length Decode Message
Call PrintTextMsg ; and show result
JMP MainLoop
MainNotD: CMP AL,ESCKEY
JE MainDone ; Quit program
CMP AL,'Q'
JE MainDone
JMP MainRead ; Ignore any other character
MainDone: call MP2xit ; Exit to DOS
main endp
cseg ends
end main