Design and Implementation

sf Firewall Software--a TCP/IP packet filter for Linux

Version 0.1, last edited on November 7th, 1996

Note: This document has not been updated to reflect new version 0.2.x features!

Table of Contents

---

The Structure of the sf Firewall

Robert Muchsel

This text describes the necessary patches to the Linux kernel, the design of the kernel interface, the kernel filter module, the operation of the firewall daemon and the firewall device.

Overview

The Components

The sf firewall consists of the following components:

[Components of the sf firewall]

Note the separation between the kernel filter module and the firewall daemon.

The kernel filter module is a so-called "loadable module" which can be inserted at run-time into the Linux kernel. From then on, it is part of the kernel (essentially as if it were compiled right into the kernel). "Loadable modules" are a Linux feature--the software however does not require the feature and could in theory run on operating systems without "loadable modules".

The firewall daemon is run as an unprivileged user process. It has absolutely no special rights but to communicate via the firewall device and to write into two log files.

The sfc user control program can be used to display both active filter rules and variables.

The firewall pipe is used for communication between the sfc user control program and the firewall daemon (this communication is triggered by a signal).

Packet Handling

Suppose an IP packet arrives at network interface 1, which should be forwarded to network interface 2 (this is the default case if the firewall machine is to be used as a router):

[IP Packet on its way through the firewall]

On arrival, the packet is put into the queue of "network interface 1" and then forwarded to the kernel IP code. From there, the kernel filter module is called (via the pointer sf_fw_chk). The sf_fw_chk function returns some result on which the kernel IP code decides what to do with the packet--drop it, reject it or accept it. If the packet is accepted, it will subsequently be put into the queue of "network interface 2" and leave our host.

The kernel filter module checks its internal tables whenever it receives an IP packet and delivers the appropriate return code (drop, reject, accept). If the firewall daemon has to be notified of the event, a message is generated and put into the message buffer.

Patching the Linux Kernel

To link the components kernel filter module and firewall daemon into the operating system, a few patches have to be made. The sfc user control and the firewall device components do not require patches and can be loaded at run-time.

A design goal of the sf firewall was to keep the number and complexity of kernel patches low, to allow for easy updating of the kernel as well as portability. As much code as possible was therefore moved to the kernel filter module and to the firewall daemon.

These are the modified files and the new files:

Description of the Kernel Filter Module Stub

The stub file, sf_stub.c, contains the filter function pointer sf_fw_chk and two dummy routines (which either reject or pass all packets). The filter function is never called directly, only via the filter function pointer. In the current implementation, the filter function pointer either points to "the real thing", the kernel filter module's function, or to one of the dummy routines in the stub file.

To add calls to the filter function pointer, ip.c is modified. When starting the system, the filter function pointer is set to the "pass packet" dummy routine sf_fw_chk_pass, so the filter doesn't do anything.

If the filter were initially set to the "block all packets" function, the system wouldn't come up. The initialization of several daemons depends on sending packets through the loopback and network interfaces. To be able to monitor and intercept these packets, the sf firewall should be started as early as possible.

Interaction between the Components

Loading the Kernel Filter Module

The kernel filter module, sf.o, contains code to handle initialization, configuration and operation of the firewall device /dev/firewall. The source code for the firewall device can be found in the file sf_device.c.

In addition, both the message buffer (for messages to the firewall daemon) and the "real" packet filter function sf_check_packet reside within this module--hence its name. However, inserting the kernel filter module into the kernel (using insmod) has no effect on the packet filter function pointer and does not enable the message buffer.

Starting the Firewall Daemon--the Firewall Device

The firewall daemon uses the firewall device /dev/firewall to communicate with the kernel filter module, the firewall daemon does not directly access kernel data structures.

When started, the firewall daemon opens the firewall device in read-write mode--the device subsequently enables the message buffer within the kernel filter module. Then the packet filter function pointer sf_fw_chk (up to this time pointing to the dummy "pass packets" function) is set to the "real" filter function within the kernel filter module.

The filter function now starts scanning all IP packets. Whenever a packet matches one of its configuration entries and the entry tells it to notify the firewall daemon, it generates a message, stores the message in the message buffer and wakes up a sleeping message reader (if applicable). Please refer to Roland Schmid's description of the filter function for detailed information.

The firewall daemon reads the entire message buffer, if there are any messages in the buffer. The buffer structure (sf_proc) contains a counter (num_entries) indicating the number of valid buffer entries.

If there are no messages in the buffer, the firewall daemon is blocked when trying to read from the device, saving CPU time. As soon as there are new messages, the firewall daemon is awakened by the filter function.

If the message buffer is full (there is no reader or the reader is too slow), the filter function tells the kernel to drop IP packets and increments the lost_packets counter.

Closing the firewall device sets the filter function pointer to the "drop all packets" dummy routine. For a discussion of this behavior, please refer to the user manual, section "Running and Controlling the Firewall".

Configuring the Filter Function through the Firewall Device

The previous section dealt with reading the firewall device. However, the device is also capable of handling write accesses.

All write accesses are interpreted as configuration commands for the filter function.

To avoid corruption of the filter tables, write accesses are checked for a "magic" signature. If this were not the case, root could overwrite the configuration "by accident".

The firewall daemon creates or deletes rules simply by writing to the firewall device.

Reconfiguring the Filter Function

To completely reconfigure the filter function (from outside, by the sfc user control program), the device is opened in write-only mode.

During the data transfer, packet processing is halted to avoid confusion of the filter function or the firewall daemon.

Reading the Active Rules from the Firewall Device

Knowing the currently active rules can be important. Therefore, the sfc user command must be able to fetch all active rules from the kernel filter module.

When the device senses a read-only access (in contrast to the read-write or write-only accesses described above), it returns the active rules.

To summarize: Opening the firewall device in

The device tracks all open and close requests and refuses to de-install if it is still open.

The Firewall Daemon

Detecting if the Firewall Daemon is already Running

The source code of the firewall daemon is split into the files sfc.c, sf_log.c and sf_spy.c. sfc.c sets up the data structures and the environment and forks; the main firewall daemon code resides within sf_log.c. sf_spy.c is used whenever counter intelligence is requested.

When the sfc user control program is invoked, it first ensures it is not run as a super user and parses the configuration file (if applicable). It then checks whether the firewall daemon is already running--a file lock on the file firewall.pid is used for this purpose.

This file contains the process ID of the firewall daemon, if the firewall daemon is already running (and the lock is therefore set). The process ID is needed for the stop, reconfig and show user commands.

The file lock does not stop the sfc user control program from reading the file. That's the UNIX way...

Signals and the Firewall Pipe

The firewall daemon catches all signals (except 9 and STOP, of course). It uses signals

The SIGUSR2 signal is sent by the sfc user control program when it reads the variables (sfc show). The firewall daemon forks itself to provide a "snapshot" of the current state (else the variables could change during the processing). This leaves the sfc user control program ample time to read the variables; a crash wouldn't be fatal, either, since the firewall daemon itself can continue.

The firewall pipe is a named pipe and is used to transfer the variable data from the firewall daemon's forked copy to the sfc user control program.

It is also very important to explicitly ignore the SIGCHLD signal, which signals the termination of "children". Since the firewall daemon itself never waits for the termination of its children (and thus never fetches the return values of its children), the children would end up as zombies--using up the process table slots. Explicitly ignoring the signal tells the operating system to clean up the process table on a death of child signal.

The set up of the signals is done in the function start_log.

Starting External Commands

The firewall daemon uses UNIX programs to send E-mail, to execute counter intelligence software or any other commands specified by the user. Since the firewall daemon is an unprivileged process and input from remote hosts is never used as an argument to the programs, the use of these programs is assumed to be safe. However, the exec user command utilizes the system call--the user is advised to not use the exec command if he (or she) is not entirely confident of the system call's safety.

Nevertheless, the data returned by finger or rusers or by the ident daemon could still contain "evil" control sequences (which might redefine the keyboard when displayed, e.g. to rm -rf *). Also, endless streams are to be expected (both time and amount of data). The command started could also hang (e.g. because someone tampered with the DNS).

Therefore, all output is filtered for control characters. The external commands are watched for endless streams and are killed if they last to long--the process table entries are a limited resource. The "slaughtering" is done by an "event", which I shall describe in the next subsection; the event procedure is called kill_spy.

The Event Mechanism

The firewall daemon relies on time-outs. Since time-outs are used so much, there is a generic event manager. The time-outs are kept in a linear list (the event queue, event_queue) together with their associated function addresses and function parameters (a struct timeout defined in sf_log.c).

To add an event, the add_event function is used. It uses the alarm call and specifies the time-out of the next following event; if the SIGALARM signal is delivered (to the function catch_alarm), the new next following time-out is used. Events are delayed if the firewall daemon is currently busy, the actual processing is done from the process_alarm function.

Time-outs are used

These time-outs are defined in the sf_custom.h header file.

Error Handling

The firewall daemon writes all errors to the log files and even sends E-mails if the free disk space gets low. However, if all else fails, it writes directly to the console and exits, thereby blocking all network traffic. Please refer to the discussion in the section "Starting and Controlling..." of the user manual for a more detailed description and motivation of this behavior.

Avoiding Duplicate Log Entries

The firewall daemon writes to the syslog and to its own log file. While the syslog daemon automatically eliminates duplicate entries ("last message repeated ... times"), this feature had to be duplicated for the firewall daemon's own log file. In addition, all log entries are automatically wrapped, if needed. The flogf function is used for this purpose.

When the firewall daemon generates a log entry, it first checks whether it is identical to the last log entry. If this is the case, the log entry is omitted and the repeated log entries counter, num_last_log, is increased. However, the firewall daemon ensures a log entry is written in certain intervals (both time and maximum number of repeated messages).

Variables and Time-Outs

Variables are stored in an array of type struct variable, the variables array.

[Variables array and struct variables]

If we ignore the possibility of "subdividing" variables into different hosts or networks (using the variable:xxx notation), each variable is a simple array entry--each array entry belonging to exactly one variable name.

To support the subdivision of variables, a dynamic list of hosts can be appended to each variable. The simple array entry then becomes the head of the list (i.e. the root element). In the root element, the address field is ignored. Both the sum of all the values in the host list and the time-out of the longest living element in the host list are stored in the root element.

Whenever a variable is encountered with an expired time-out, the value 0 is returned. In contrast to expired dynamic filter rules, no alarm or signal is needed.

If a new element has to be appended to a host list, the list is first checked for expired elements. If an expired element is found, it will be recycled--else a new element will be allocated.

The functions to read and manipulate variables are to be found in the file sf_daemon.c.

Counter Intelligence

sf_spy.c contains the code for the firewall daemon's counter intelligence. The precautions taken to ensure secure operation of external programs haven been described above. In addition, the firewall daemon guarantees that one particular host is only spied on every SPY_TIMEOUT minutes, SPY_TIMEOUT being a constant defined in sf_custom.h. Please refer to the user's manual, section "Spying and Counter Intelligence", for further information.

The first thing the counter intelligence code does is checking and reverse checking the host's name and address using both the gethostbyname and gethostbyaddr calls.

In the current implementation, the following methods are used to find out a remote user's name:

While finger and rusers are straightforward to implement using the standard operating system tools, the identd query requires a small user program (the code resides in sfident.c). This small program simply connects to the auth port on a given server (IP notation, no name lookup is required, since the firewall daemon already has the IP address) and returns what it receives from there. If the output conforms to the RFC 1413 definition of identd, the output is also a little bit reformatted. If not, the program prints what it gets.

Enhancing the Firewall Daemon

Adding New Keywords

Adding new keywords to the notification structure requires changes to the parser (see Roland Schmid's documentation) and a few new lines in the execute_notify function (file sf_daemon.c).

Enhancing the Counter Intelligence

...should be quite easy (but keep in mind that the counter intelligence is triggered automatically). Change the finger function in the file sf_spy.c:

---

Configuration Data

Roland Schmid

The configuration data is read from the configuration file and parsed directly into the data structures used by the firewall daemon. The rules are passed to the filter using the firewall device, functions to add, delete and modify the filter configuration are defined in the file sf_config.h. This chapter describes the configuration data structures used by the firewall daemon. The data types used by both the firewall daemon and the filter are defined in sf_global.h, those used only by the firewall daemon are defined in sf_config.h.

Filter Rules

The parser stores the rules in a linear list in reverse order, so the parser and the kernel can insert each rule at the head of the list in order to obtain a correctly ordered list in the kernel. The head of the list is pointed to by the variable rules in sf_config.h.

Each rule contains a pointer to itself (ptr in the union rule_id). This pointer is only valid for the firewall daemon in user mode. The filter treats this value as unique id for the rule, using the field num in the union rule_id. When the filter passes the rule id to the firewall daemon (as part of the log information), it can be used to directly access the rule.

The parser stores the notification level number for each rule in level.num. Later, the level number is converted into a pointer to the notification structure (see below) using the function convert_levels in sf_config.c.

All addresses used by filter rules are stored in the array sf_addr. Each array entry consists of address, mask, port number and end of port number range. port and prend are set to zero if they are not needed, mask is set to zero if the address is to be ignored. The sf_fw structure points to the array using offset and count variables for the source, destination and rip addresses. sf_addr[0].addr contains the number of internalnet addresses. They are stored in the array starting at offset 1. Thus all rules using the inside or outside keywords without specifying port numbers do not have to copy the addresses but can point to offset one. Flags are used to indicate which of the two keywords has been given.

Notification Structure

struct notification contains the information about one notification level (see sf_config.h). The different levels are stored in a linear list using the variable notify as anchor. The structure contains flags for the actions syslog, spy and relevel; messages and mail addresses are stored in dynamically allocated strings. For all other actions linear lists are used, thus an unlimited number of commands like let, if, exec, etc. can be specified within one notification level. The relevel value is treated like the notification level value in the rules (see above).

Let and if statements are kept in one chain (let_if_chain) in order to be able to execute them in the specified order. The structure contains a then pointer to struct notification. So the then-part of an if statement may contain any actions including other if statements. The depth of nested ifs is not limited.

While the configuration file is being parsed, the variable not always points to the notification structure that is being built at the moment. Due to the possibility of nested ifs, the construction of the if_chains requires special effort. The variable iftmp always points to the lowest level if structure to which the actual notification structure belongs (iftmp is NULL when the parser is not within an if statement).

I shall now explain the parsing of two nested if statements step by step. The following graph shows the pointer structures before encountering the first if keyword:

[structure before 1st if keyword]

After the first if keyword, the let_if_chain structure is allocated and appended to the n_let_if list:

[structure after 1st if keyword]

When the parsers encounters the then keyword, the notification structure is allocated. The not variable points to the new notification structure, the old value of not is stored in the then field of the let_if_chain structure, so it can be restored when the endif is reached:

[structure after then keyword]

Nested ifs are treated the same way. When allocating a new let_if_chain structure, the old value of iftmp is stored in the next field, so it can be restored on an endif. This is how the structure looks after parsing the second if keyword:

[structure after 2nd if keyword]

Each time the parser encounters an endif, the value of the not pointer is assigned to the then field in the let_if_chain and the old values of not and iftmp are restored. After the last endif, the if structure is complete:

[complete structure]

Configuring the Filter

After the whole configuration file has been read, each static rule is passed to the filter using the function sf_config_add. The function sf_config_addr transfers the whole address array at once. The firewall daemon keeps track of the state of the dynamic rules. Each time a dynamic rule is activated, it must be passed to the filter. When its validity expires, it has to be deleted using sf_config_delete. The address array must be retransmitted each time its contents change due to the generation of a dynamic rule. The function sf_config_clear can be used to delete all filter rules before passing a new set of rules to the filter.

There are two other functions used by the sfc program: sf_config_flush and sf_config_flush_all. While sf_config_flush_all deletes all hash queue entries for established TCP connections, sf_config_flush_all deletes only those that are not allowed due to the active configuration.

---

The Packet Filter

Roland Schmid

In this chapter, I shall describe the details of the filter implementation in the file sf_filter.c.

Each time an IP packet is sent or received, the function sf_check_packet is called. The parameters are a pointer to the start of the IP header, a pointer to the corresponding device and a flag indicating if the packet is being received, sent or forwarded. The function is called from the kernel files ip.c and tcp.c. The return codes are defined in sf_kernel.h and tell the kernel whether to delete the packet or not and whether to generate an ICMP error message. The filter function returns as soon as possible after determining whether the packet is allowed. It does not perform all tests in all cases.

Address Spoofing

First the packet is checked against address spoofing. If the interface is not a loopback interface, but the packet contains a loopback address as source or destination, the packet is not allowed.

If the packet is local, i.e. the source and destination addresses are equal, the loopback spoofing test is the only test that has to be done and the filter can return SF_RC_ACCEPT.

Now a lookup in the interface hash queues is done to determine whether the interface is internal or external. If there is no hash queue entry for the interface, the function sf_inside is called. It compares the interface address with the internalnet addresses. The result is written to the hash queue for later usage.

If the packet is being received, the source address is used for the spoofing check, otherwise the destination address is used. If the address is not a class D address and if the interface is not the loopback interface, both the packet address and the interface address must be either internal or external.

If any type of spoofing is detected, a local rule structure is initialized with one of the special rule id values RULE_SPOOF_RECV or RULE_SPOOF_XMIT, the rest of the filter is skipped and the packet information is passed to the firewall daemon (see below).

If the packet is being forwarded, the examination can be aborted after the spoofing check, because the packet has already been checked at receive time.

Fragmentation

If the fragment offset in the IP header is not zero, the packet is part of a fragmented IP packet. Using the packet id as hash key, a lookup in the fragments hash queues tell the filter if the fragment may pass or not. The hash queue entries use a timer mechanism which deletes an entry if no corresponding fragment has passed for a certain time. This timer has to be reset if the lookup is successful.

The fragment need not be checked against anything else, so the filter returns immediately here.

Every time SF_CHKFRAG is executed before accepting a packet. SF_CHKFRAG checks if the packet is the first part of a fragmented packet and if so, it creates the hash queue entry and starts the timer.

TCP

Permitted TCP connections are stored in hash queues. The hash key is the sum of the source and destination addresses and the source and destination ports. This way, the key is independent of the packet's direction. The state field in the hash entry contains the simplified state of the TCP connection. So the filter can determine whether the connection is established, whether one of the hosts has already sent a fin packet, whether the connection is terminated, etc.

A packet indicating a connection establishment has the syn bit set, but not the ack bit. In this case, the TCP connection is normally not yet in the hash queue (except for FTP data connections). So the rest of the TCP code is skipped and the packet is checked against the rules.

If the TCP connection that the packet belongs to is not in the hash queues for any other packet, the packet is not allowed and the filter returns immediately.

Now we know that the TCP packet belongs to a permitted connection. First we check if the packet is an FTP packet containing a port command. In this case a new hash entry for the data connection is created. This is the place where other TCP based application level filtering can be done, for example for RPC.

The rest of the TCP code is self explanatory. The state field and the timer mechanisms are maintained according to the flags set in the TCP header.

It is possible to add a timeout for established TCP sessions in order to delete the hash entries for connections that have died silently or that have not been used for a certain time. To do this, you must start a timer at connection establishment time and reset it every time a packet belonging to the connection passes. This would certainly decrease the efficiency and kill idle connections which are otherwise perfectly legal.

Rules

Only non-TCP packets and TCP packets initiating a new connection are checked against the rules. The linear list containing the rules is traversed and each rule is compared to the packet. As soon as a rule matches the packet, the loop is terminated and the packet is treated as indicated in the rule.

First the packet's protocol is examined. Normally the protocol fields of the rule and the packet are compared. IGMP and ICMP require an additional test to see if the packets message type is one of the types indicated in the rule. This has to be done in a switch statement, because in one rule more than one type can be specified and the information is stored using a bitmap. The RIP rules require special treatment. If the protocol is UDP and the port is 520 (RIP port), the function check_rip is called to determine if all announced destinations are listed in the rule. Other application level tests for non-TCP-based protocols can be added here, for example to check additional routing protocols.

After checking the time to live value and the IP options, the source and destination addresses and ports are examined using sf_addr_match, which is explained by the comments in the source code.

When the loop has terminated, the variable rule points to the matching rule or to NULL if no rule matches. If a TCP packet matches an accept rule, the hash queue entry for the TCP connection is created. If no log information has to be created, the filter returns.

Log Information

A buffer is used to pass the log information from the filter to the firewall daemon via the firewall device. If the buffer is full, the packet is silently discarded regardless of the filter's examination result. Thus the behavior is the same as if the packet had got lost for some reason. Otherwise the log information is written to the first free position in the buffer.

The log information consists of the return code, the rule id, the name of the device and the first 176 bytes of the packet. This size is chosen so that at least all protocol headers are included and the total size of the log structure does not exceed one memory page (see sf_filter.h).

Now the firewall daemon is woken up and the filter function returns with the return value specified in the matching rule.

Configuration and Control Routines

When configuration or control commands are executed using the sfc user control program, the necessary information is passed to the filter via the firewall device. The device calls the function sf_write_config (in sf_filter.c). This function allocates new kernel memory and copies the buffer to it. After that, it parses the command and calls the appropriate function.

sf_init is called automatically when the kernel filter module is inserted to initialize the hash queues. sf_del_all_timers is called when the kernel filter module is unloaded. All active timers have to be canceled, because on expiration they would try to free kernel memory that is no longer allocated.

sf_clear, sf_add, sf_replace and sf_delete are used to maintain the linear list of rules. sf_flush and sf_flush_all are explained above.

When the command SF_COMMAND_FW_ADDR is given, the buffer is copied to the address array. This is done directly in sf_write_config.

The functions sf_rule_first and sf_rule_next are used to transfer the actual configuration to the sfc process when the user executes sfc show. The next rule to be transferred is stored in rule_first_next. The configuration is taken from the filter, although the firewall daemon knows about all active rules. Due to this approach the user can be sure that the output shows the rules that are applied by the filter. The reduced efficiency is not important as the show command is not executed during normal operation.


Copyright © 1996 Robert Muchsel and Roland Schmid.

Click here to mail your comments and suggestions.

Table of Contents