February 28, 2008

XBus Enterprise Application Integration

Here's my primer on XBus which is an open source Enterprise Application Integration framework available on sourceforge.

What is it?

It's a framework (set of tools) which allows data (text, xml, byte streams) to be translated, processed, and routed from one system to another (database, AS400, Web Service) using a variety of protocols (HTTP, File, JDBC, FTP).

Can you give me an example?

Let's say you need to go to a FTP site and download a bunch of account numbers and dump it into a database everyday. You could hire a programmer to write a program to do it. He would choose his favorite language (probably a scripting language like ruby or perl) then create a cron job to run the script. It would take him 1 day to do it. His code would not be very maintainable except by himself.

With xBus, and armed with some xml parsing and transformation tools, you could create an xBus system by defining routing rules in several config file. It would take you half a day to do it and it would be very maintainable - involving no new code.

Are you kidding me?

Maybe.

Break it down for me

You have config files which are named standard_whatever.conf. It has to start with standard and it has to end with .conf. Inside the config file you have your base definitions. This includes the setup for Trace (logging), Journaling (more logging), Error Handling (Error logging) and other system parameters. You can have more config files, which are also read and appended to each other. Just make sure you don't define the same thing twice. Each entry in the config file contains a key and value. Each key is categorized by three parts - namely chapter, section and key.

To create a process you need to some systems. All systems get data from someplace such as files, HTTP connections, FTP or other systems). If the system gets data from an external source, its called a receiver. Some systems sends data somewhere else. If the system sends data to an external sink, its called a sender. So the most basic process is one receiver, one sender and one routing instruction.

To define a system, several configuration entries are needed, according to the type of system. For example for a FileReceiver, you need to define the filename, the type of message and what to do with the file if there is an error, if the file is empty, etc.

To define a routing instruction (which is sequential in nature) You need to define the routing strategy. This can be either Invoke, Distribute or both. With Invoke, the message is sent and the response and acknowledgment is required. With Distribute only the acknowledgment is required.

How do you run a process

There are two ways to run a process. The first is to run it manually. A simple command is issued from the console. It can be linked to a scheduled task/cron job. The second way, which is required for certain Receivers is to add it to a servlet engine or a background service. This is required if you're polling a folder, or listening to a socket, or accepting HTTP requests. So they way to run your process depends on what your first system is.

Is that all?

No. There's more. But this is a primer, not a user's guide. There's lots more but this should help you wrap your head around the concepts. There are other features such as XSL transformations, Java Message Queues for asynchronous processing, Tomcat integration etc. Good luck.

February 27, 2008

EBCDIC and Packed Decimal on AS400

Last week I connected to an AS400 system. What we were doing was sending requests and reading the responses. Connecting, sending and receiving was not a problem because the AS400 system communicated using TCP/IP sockets. So a simple .NET component did all the work. What was difficult was figuring out the message formats without much official documentation.

Here's a primer on digital data. All digital data is stored in a binary digit or bit - that is 0's and 1's. It gets tiresome to read 0's and 1's so we group 8 binary digits into a byte. To view things on your computer screen, these bytes are then translated to characters using an encoding. The most common encoding is ASCII, which converts a byte to a character. ASCII is used in a lot of modern operating systems including MAC, UNIX, LINUX and Windows. Later people discovered that a byte which could represent a number between 0 and 256 was not enough to encode all available characters including arabic, chinese, hebrew and other characters. So they came up with unicode where each character is represented by 2 bytes.

An earlier encoding - which predates ASCII is EBCDIC. It too encodes a character in 1 byte but the mapping is all switched around. So writing a program in .NET which uses ASCII to write a message in EBCDIC is the challenge we will address in this post.

From the AS400 system, we were able to get the trace log which shows the communication going between the client and the AS400 system. This is what we needed to reverse engineer the message format. However the communication traveling between the two systems is encoded in EBCDIC and viewing it in an ASCII browser only led to gibberish. Certainly there are EBCDIC editors available, some are free and some you need to pay for. But what is curious about the message being transmitted is that some of it is EBCDIC some of it is still binary data in the form of a binary coded decimal (more on that later).

So the first problem is reading the data transmitted. For this I recommend a HEX editor/viewer. There are HEX viewers with a variety of features and functions which will allow you to do the job much easier. But this blog is not about doing it the easy way. This blog is about doing it without needing to pay any money. So download any hex viewer. I used Notepad++ which is a free portable app. There is also Hex Editor NEO from HDD software.

Make sure you have the binary data alone - in a file and open it with the hex editor. Now you have a bunch of numbers, characters, dots, squares, funny characters, and umlauts. It's a table. On the far left is the address of the row. On the top is the address of the column. So add those two and you have the address of the byte. The byte is the two hex-digit entry you see separated by spaces. On the far right is the encoding of the byte probably in ASCII. More expensive hex editors allow you to change the encoding to EBCDIC.

Each byte is represented by 2 4-bit hex-digits. A hex-digit is a number between 0 and 15, where A represents 10, B represents 11 and so forth. A hex digit is also known as a nibble (half a byte - get it?) Is that clear? It gets confusing at this point so re-read the whole paragraph again if you must.

So two nibbles = 1 byte. Two hex-digits make 1 byte. 00 = 0 and FF = 255. Got it? I'm sorry a 10-year IT professional needs to write this primer. High-level programming doesn't really delve into this topic much. I remember when I was a high school student taking extra classes in computers learning about punch cards. Computers didn't have keyboards or hard disks. Just punch card readers and punch cards. See Apu's doctoral thesis blackjack program on punch-cards.

So these bytes convert to EBCDIC. But how? Lacking a nice hex editor which can convert bytes to EBCDIC, you need a mapping table. Unfortunately there are a few flavors of EBCDIC so try to find one that matches.



Here is one I lifted off this website.

Now you know how to do it manually. Here is how you can do it in .NET. There is no EBCDIC encoder in the .NET System library. There is however an API for encoder classes. So just find an encoder class which is freely available. This is the one I used written by Jon Skeet from Reading, England. Sample code is on the page.

So that's settled. All that's left is to write some byte array manipulation code and to watch your indexes. The next problem is that the message format also contains some Packed Decimals. This is also known as Binary Coded decimals. Here's a primer to that.

To code a number in ASCII you would need 1 byte per digit. This is wasteful because you only need 10 values to code a digit when a byte gives you 256 values. Remember our nibbles? Recap: a nibble is 4-bits - or half a byte -represented by a hex-digit.

A hex digit has 16 values. So if we code a decimal digit in a hex digit we only waste 6 values and not 246 values. Here's another way to look at it, we can represent 100 values (0-99) in a byte using the binary coded decimal representation versus only 10 values in the ASCII representation. Here's what a binary coded decimal looks like: 0 = 0x00, 1 = 0x01, 10 = 0x10, 20 = 0x20, 99 = 0x99.

Hang on... why not just code a number using the entire byte - leaving you with 256 values per byte. Certainly this is the most compact way? I don't know. Perhaps it more operation extensive to convert the bytes to a number if coded that way because it involves multiplication operations. From what I've read of BCD (which isn't much - this is the encoding) with a few more notes.

1. The sign (negative or positive) is encoded as the last nibble. I forget but its like F and C for positive. D for negative.

2. If the nibbles are odd number - including the sign, pad with a zero in front. That is if you have a 6 digit BCD (999999 in decimal) its BCD representation is 9 99 99 9F. Notice the first byte is missing one hex-digit so just pad it with a zero 0x0999999F.

3. I'm not sure how decimal places work but I suspect a decimal point location is agreed upon by both sender and receiver. So the BCD value is then divided by powers of 10.

There - now you know all there is to know.