February 27, 2008

EBCDIC and Packed Decimal on AS400

Last week I connected to an AS400 system. What we were doing was sending requests and reading the responses. Connecting, sending and receiving was not a problem because the AS400 system communicated using TCP/IP sockets. So a simple .NET component did all the work. What was difficult was figuring out the message formats without much official documentation.

Here's a primer on digital data. All digital data is stored in a binary digit or bit - that is 0's and 1's. It gets tiresome to read 0's and 1's so we group 8 binary digits into a byte. To view things on your computer screen, these bytes are then translated to characters using an encoding. The most common encoding is ASCII, which converts a byte to a character. ASCII is used in a lot of modern operating systems including MAC, UNIX, LINUX and Windows. Later people discovered that a byte which could represent a number between 0 and 256 was not enough to encode all available characters including arabic, chinese, hebrew and other characters. So they came up with unicode where each character is represented by 2 bytes.

An earlier encoding - which predates ASCII is EBCDIC. It too encodes a character in 1 byte but the mapping is all switched around. So writing a program in .NET which uses ASCII to write a message in EBCDIC is the challenge we will address in this post.

From the AS400 system, we were able to get the trace log which shows the communication going between the client and the AS400 system. This is what we needed to reverse engineer the message format. However the communication traveling between the two systems is encoded in EBCDIC and viewing it in an ASCII browser only led to gibberish. Certainly there are EBCDIC editors available, some are free and some you need to pay for. But what is curious about the message being transmitted is that some of it is EBCDIC some of it is still binary data in the form of a binary coded decimal (more on that later).

So the first problem is reading the data transmitted. For this I recommend a HEX editor/viewer. There are HEX viewers with a variety of features and functions which will allow you to do the job much easier. But this blog is not about doing it the easy way. This blog is about doing it without needing to pay any money. So download any hex viewer. I used Notepad++ which is a free portable app. There is also Hex Editor NEO from HDD software.

Make sure you have the binary data alone - in a file and open it with the hex editor. Now you have a bunch of numbers, characters, dots, squares, funny characters, and umlauts. It's a table. On the far left is the address of the row. On the top is the address of the column. So add those two and you have the address of the byte. The byte is the two hex-digit entry you see separated by spaces. On the far right is the encoding of the byte probably in ASCII. More expensive hex editors allow you to change the encoding to EBCDIC.

Each byte is represented by 2 4-bit hex-digits. A hex-digit is a number between 0 and 15, where A represents 10, B represents 11 and so forth. A hex digit is also known as a nibble (half a byte - get it?) Is that clear? It gets confusing at this point so re-read the whole paragraph again if you must.

So two nibbles = 1 byte. Two hex-digits make 1 byte. 00 = 0 and FF = 255. Got it? I'm sorry a 10-year IT professional needs to write this primer. High-level programming doesn't really delve into this topic much. I remember when I was a high school student taking extra classes in computers learning about punch cards. Computers didn't have keyboards or hard disks. Just punch card readers and punch cards. See Apu's doctoral thesis blackjack program on punch-cards.

So these bytes convert to EBCDIC. But how? Lacking a nice hex editor which can convert bytes to EBCDIC, you need a mapping table. Unfortunately there are a few flavors of EBCDIC so try to find one that matches.



Here is one I lifted off this website.

Now you know how to do it manually. Here is how you can do it in .NET. There is no EBCDIC encoder in the .NET System library. There is however an API for encoder classes. So just find an encoder class which is freely available. This is the one I used written by Jon Skeet from Reading, England. Sample code is on the page.

So that's settled. All that's left is to write some byte array manipulation code and to watch your indexes. The next problem is that the message format also contains some Packed Decimals. This is also known as Binary Coded decimals. Here's a primer to that.

To code a number in ASCII you would need 1 byte per digit. This is wasteful because you only need 10 values to code a digit when a byte gives you 256 values. Remember our nibbles? Recap: a nibble is 4-bits - or half a byte -represented by a hex-digit.

A hex digit has 16 values. So if we code a decimal digit in a hex digit we only waste 6 values and not 246 values. Here's another way to look at it, we can represent 100 values (0-99) in a byte using the binary coded decimal representation versus only 10 values in the ASCII representation. Here's what a binary coded decimal looks like: 0 = 0x00, 1 = 0x01, 10 = 0x10, 20 = 0x20, 99 = 0x99.

Hang on... why not just code a number using the entire byte - leaving you with 256 values per byte. Certainly this is the most compact way? I don't know. Perhaps it more operation extensive to convert the bytes to a number if coded that way because it involves multiplication operations. From what I've read of BCD (which isn't much - this is the encoding) with a few more notes.

1. The sign (negative or positive) is encoded as the last nibble. I forget but its like F and C for positive. D for negative.

2. If the nibbles are odd number - including the sign, pad with a zero in front. That is if you have a 6 digit BCD (999999 in decimal) its BCD representation is 9 99 99 9F. Notice the first byte is missing one hex-digit so just pad it with a zero 0x0999999F.

3. I'm not sure how decimal places work but I suspect a decimal point location is agreed upon by both sender and receiver. So the BCD value is then divided by powers of 10.

There - now you know all there is to know.

No comments: