Tuesday, October 19, 2010

OSSEC decoders 101


In a previous post I mentioned that writing rules and decoders for OSSEC can be easy. Now I'm going to attempt to backup that claim bydetailing the process of writing a decoder. A similar post for writing rules will come later. To do this I'll use decoders for OpenSSH that are currently included in OSSEC, and an old log message from a Fedora 11 system.
The log message:
Oct 8 14:15:33 fedora11 sshd[11773]: Failed password for ddp from 172.16.51.1 port 56588 ssh2
To help write the decoders I'll be using the ossec-logtest application. ossec-logtest takes input from stdin and traverses the decoders and rules looking for a match. ossec-logtest should be available on all server and local (not agent) installs.
Having a basic understanding of OSSEC's regular expression (regex) support may help with more complicated decoders. For the log message above we'll be using regex to pull out the username and IP address.
Before I discuss decoders, I should explain how OSSEC parses syslog messages. There are three main parts to OSSEC's log analysis engine: pre-decoding, decoding and signatures.

Phase 1, pre-decoding, extracts known fields like time, the hostname of the system where the message originated, the name of the program that created the message and the log message itself (without some of the metadata).
Here is an example of information gathered during the pre-decoding phase using ossec-logtest:
-----------------------------------------------------------
# /var/ossec/bin/ossec-logtest -D . -c etc/ossec.conf
2010/10/01 08:54:31 ossec-testrule: INFO: Reading local decoder file.
2010/10/01 08:54:31 ossec-testrule: INFO: Started (pid: 22955).
ossec-testrule: Type one log per line.
Oct 8 14:15:33 fedora11 sshd[11773]: Failed password for ddp from 172.16.51.1 port 56588 ssh2
**Phase 1: Completed pre-decoding.
full event: 'Oct 8 14:15:33 fedora11 sshd[11773]: Failed password for ddp from 172.16.51.1 port 56588 ssh2'
hostname: 'fedora11'
program_name: 'sshd'
log: 'Failed password for ddp from 172.16.51.1 port 56588 ssh2'
-----------------------------------------------------------
As you can see the timestamp has been removed. OSSEC uses an internal timestamp of when the message was received instead of the timestamp on the event. After the timestamp is the hostname, 'fedora11' (don't judge, this was an old log message). The program name, sshd, was pulled out of the message and the process ID (11773) was removed and discarded. Finally, the log message without the metadata is displayed.
Phase 2, decoding, comes next. What information goes in? What information comes out? How do I get more information? And how do I write decoders for my applications?
Phase 3 will attempt to match a rule to the log message, and will be detailed in another post.

The decoders I will be using for these examples revolve around OpenSSH. OpenSSH is a complicated program, and has a number of logs. Some are more interesting than others, and we try to create rules for as many as we can. In a number of the logs information like usernames and source IP addresses are available, and this can be useful information. OSSEC can be configured to block IPs attempting to brute force an OpenSSH password, or disable users that are being targeted.
Just a quick note: Some distributions or operating systems have modified the logs in the default OpenSSH source code, so maintaining the sshd decoders has not been the easiest task. It can be a bit disheartening to see the same log information displayed in 3 slightly different and incompatible log messages on different Linux distributions. If anyone knows why package/OS maintainers feel a need to modify log messages please let me know.

Anyways, on to the decoders!
OSSEC decoders are written in XML, and have a very basic format. They can be found in /var/ossec/etc/decoder.xml, and custom decoders can be added to /var/ossec/etc/local_decoder.xml. The local_decoder.xml file will not be overwritten during an upgrade, but the system default decoder.xml will. The easiest decoders are three lines long, and don't include anything more complicated than using the "^" character to indicate that the string that follows is at the very beginning of the section. Many of the fields in decoders can utilize the OS_Match or sregex

Here's the first example, the primary sshd decoder, followed by a line-by-line explanation:
<decoder name="sshd">
  <program_name>^sshd</program_name>
</decoder>
  • Each decoder starts the same way, with the "decoder," and "name" tags.
  • The name must be a unique string, alpha-numeric and should identify the purpose of the decoder. In this case the name of this decoder is just "sshd."
  • The next line indicates that this decoder will be applied to any log event produced by the "sshd" application. In this decoder the <program_name> field checks if the program_name begins with the string "sshd." If it does, OSSEC applies this decoder. Another option instead of <program_name> would be to include a <prematch> field, and this option will be detailed later in this post. All decoders must have either a <program_name> or <prematch> field, but <program_name> is often easier.
  • The "</decoder>" tag ends this decoder entry.


The following is an example of the output of ossec-logtest without the sshd decoder in place:
-------------------------------------------
Oct 8 14:15:33 fedora11 sshd[11773]: Failed password for ddp from 172.16.51.1 port 56588 ssh2

**Phase 1: Completed pre-decoding.
full event: 'Oct 8 14:15:33 fedora11 sshd[11773]: Failed password for ddp from 172.16.51.1 port 56588 ssh2'
hostname: 'fedora11'
program_name: 'sshd'
log: 'Failed password for ddp from 172.16.51.1 port 56588 ssh2'

**Phase 2: Completed decoding.
No decoder matched.
-------------------------------------------

Notice how no decoder is found in "Phase 2."


Now, for the output of the same log message with the basic sshd decoder in place:
-------------------------------------------
Oct 8 14:15:33 fedora11 sshd[11773]: Failed password for ddp from 172.16.51.1 port 56588 ssh2

**Phase 1: Completed pre-decoding.
full event: 'Oct 8 14:15:33 fedora11 sshd[11773]: Failed password for ddp from 172.16.51.1 port 56588 ssh2'
hostname: 'fedora11'
program_name: 'sshd'
log: 'Failed password for ddp from 172.16.51.1 port 56588 ssh2'

**Phase 2: Completed decoding.
decoder: 'sshd'
--------------------------------------------

Now the decoder is identified as "sshd".



With this information, OSSEC can focus on the rules that apply to the sshd decoder and not waste resources looking at rules that don't.
But this decoder does not give us much information. Many of sshd's logs include useful information like the source IP or username. This information may be useful to OSSEC admins. To get this information, child decoders can be used. Parent decoders can have any number of child decoders, and each child can have only 1 parent. This allows for a wide variety of log messages to be decoded properly without making it horribly difficult to understand and maintain decoders.

This next decoder pulls more information out of the log message (when available), and provides it in various fields that may be used in rules or active responses.

ssh-failed decoder:
<decoder name="ssh-failed">
   <parent>sshd</parent>
   <prematch>^Failed \S+ </prematch>
   <regex offset="after_prematch">^for (\S+) from (\S+) port \d+ \w+$<;/regex>
  <order>user, srcip</order>
</decoder>

  • This decoder is a bit more complicated than the "sshd" decoder. It starts the same way as the "sshd" decoder, with the "decoder" and "name" tags.
  • The second line (<parent>sshd</parent>) indicates that the "ssh-failed" decoder is a child of the "sshd" decoder. If "sshd" does not match, this decoder will not be checked, saving system resources. A log message that matches the "ssh-failed" decoder will be identified as belonging to the "sshd" decoder to make categorizations and administration easier as you will not need to know the name of the specific child decoder, only the parent.
  • The third line (<prematch>^Failed \S+ </prematch>) is used by OSSEC to determine if this child decoder is the appropriate one to apply to this log message. As I said before, each decoder can have any number of children, so there has to be a method to select one for each log message. This line is using a simple regex pattern. This pattern is compared to the log message, if there is a match OSSEC continues with this decoder.

  • The regex line is one of the more interesting lines in this decoder. I will break this line into two parts:
  1. <regex offset="after_prematch"> - This identifies the line as a regex entry, and indicates that OSSEC should try to match this regular expression to the log message after the prematch. That means, from the log message only the information after 'Failed password ' will be used.
  2. ^for (\S+) from (\S+) port \d+ \w+$</regex> - This second part of the regex line is a little more complicated. The beginning of the line should start with the characters 'for'. The "\S+" is a regular expression for "one or more non-whitespace characters." The parenthesis around this regex force the value to be kept in memory for use later. Next is "from (\S+)". Again, OSSEC looks for the string "from " and places the next one or more non-whitespace characters in memory. The last bit, " port \d+ \w+$" instructs OSSEC to look for the string "port", one or more numbers, and a word (a string of: A-Z, a-z, 0-9 characters) immediately followed by the end of the line.

  • The <order> line is where the values pulled out of the log message by the regex operation will be categorized. Between the order tags are the values "user" and "srcip" (seperated by a comma). The first value from the regex line will populate the user field, and the second the srcip field.

  • In the example log message this lines up with "ddp" as the user, and "172.16.51.1" as the srcip. There are a number of options available in the <order> field, and a list is at the end of this post.

In this example, using the ssh-failed decoder, you can now see the dstuser and srcip have been pulled out of the log message (note that the decoder is still identified as sshd, not as ssh-failed).
------------------------------------------
Oct 8 14:15:33 fedora11 sshd[11773]: Failed password for ddp from 172.16.51.1 port 56588 ssh2

**Phase 1: Completed pre-decoding.
full event: 'Oct 8 14:15:33 fedora11 sshd[11773]: Failed password for ddp from 172.16.51.1 port 56588 ssh2'
hostname: 'fedora11'
program_name: 'sshd'
log: 'Failed password for ddp from 172.16.51.1 port 56588 ssh2'

**Phase 2: Completed decoding.
decoder: 'sshd'
dstuser: 'ddp'
srcip: '172.16.51.1'
---------------------------------------------


In the output from ossec-logtest, the "user" field is displayed as dstuser. I don't know why, but in this case it does happen to be the destination user.

I mentioned using offset="after_prematch" above, but there is another option. You can also use offset="after_parent". Using "after_parent" in the above example wouldn't help with the ssh-failed decoder, since the sshd decoder essentially stops after "sshd[11773]: ". If the parent decoder went further into the log message "after_parent" would be more useful.

So that's how you write a decoder in OSSEC. It can take a few tries to get a decoder right, especially when using regular expressions. The ossec-logtest program is indispensable in these situations. There have been plenty of times where I've had to break a regex line down to the basics, and slowly build it up until it worked the way I wanted.
And, as a final note, the order of decoders in decoder.xml is important. The first decoder OSSEC comes across that matches is applied to the log message, first match wins. I've spent time editing regex lines trying to get a decoder to work only to find out another decoder beat mine to the match.

 
I sometimes have trouble finding these, so here are the various options that can be used inside of <order>:
user
srcuser
dstuser
srcip
dstip
srcport
dstport
port
extra_data
status
protocol
url
id
action

No comments:

Post a Comment