In our previous post we started to analyze the relationship between Apache and mod-jaxer by capturing the packets that were exchanged between these two processed. We found that first some handshake bytes were exchanged followed by headers and the content of the file being handled by Apache. As this looks like the main functionality we will have to replicate, in this blog post we will take a closer look at the header data sent by Apache.
A full image of the data can be found above with notes. It took a few days to break down the structures being used, and while there are still a few questions that remain, the overall structure breaks down to a JSON-like set of key-value pairs encoded in a way to make it easier to read with C.
To start out with we have the bytes 0x01 an 0x40. What I think that means is that we start with block 0x01, and the length of the block is 0x40 bytes. After doing some testing I think the length is defined by a two byte short in big endian order.
Following that we have the bytes 0x00 0x03 and 0x00. I’m not sure if there is any significance to the zero padding, but the meaning of the 0x03 seems to indicate that there are three key-value pairs for that block. Following that we find something that is pretty friendly to someone who may have taken a basic computer science class, the value of 0x0a, followed by “User-Agent” followed by 0x00. And then after that we have 0x10 followed by “user-agent-name” (and argument I passed into curl for debugging), followed by 0x00.
So it seems that every string in the data has a byte length declared before it, followed by a 0x00 to terminate it. I suppose that means that the string length is capped at 255 characters, but I think that for the purposes of http headers, that’s probably enough. But we’ll see as we get into more testing if that trend continues to hold up or not.
The data seems to have three main blocks for headers, and a fourth block for the content. The first two with mixed-case headers that include the mixed case “User-Agent”, “Host” and “Accept” for the first block. Followed by “Last-Modified”, “ETag”, “Content-Length”, and “Content-Length” for the second block. After some testing is seems that there can be more or less headers in these blocks depending on what headers are sent from the client. It’s also a small detail, but worth noting that the last string value for a given block doesn’t seem to be zero terminated, but instead cut off by the length defined in the block header.
The third block seems to be values set by the Apache server as most of these values seem to come from the Apache configuration. The good news about this is that it means we can probably save the bytes for the third block as a fixed constant and simply re-use it. Though I think I might have to change the SCRIPT_PATH, SCRIPT_FILENAME, and QUERY_STRING, so I guess I can’t get off that easy. One interesting occurrence that we see in this block is that we find the bytes 0x00 0x00 for a string value that doesn’t exist. Likely meaning zero for the length, and then zero to terminate.
Finally in the fourth block we have the content of the file being requested from Apache. So we can replicate this by reading the file before passing the request to mod-jaxer, and appending it to the header buffer we prepare. And then finally on the end we have something that is probably termination bytes. So now that we have a general idea of the format that mod-jaxer expects requests to be formatted as, we can start to attempt to replicate the functionality of Apache by creating our own header data and attempting to pass it into mod-jaxer.