Monday, September 15, 2014

Why static analyzers should see all the code

Just for fun, I decided to run a new "standard markdown" C code through a static analyzer provided by the Clang project. On the surface, this looks very easy:


CCC_CC=clang scan-build make stmd

It even finds bugs. A lot of dead assignments, and some logic & memory errors: dereferencing a null pointer, memory leaks and a double-free. However, are they real?

E.g., it complains that the following piece of code in src/bstrlib.c introduces a possible leak of memory pointed by buff which was previously allocated in the same function:


bdestroy (buff);
return ret;

It does not understand that bdestroy is a memory deallocation function. Indeed, it could be anything. It could be defined in a different file. It indeed does not destroy the buffer and thus leaks the memory if some integrity error occurs (and the return code is never checked).

So indeed, the code of bdestroy smells somewhat. But is it a problem? How can we trick clang into understanding that this can't happen?

Part of the problem stems from the fact that clang looks at one file at a time and thus does not understand dependencies between functions defined in different files. There is, however, a way to fix it.

All we need to do is to create a C source file that includes all other C source files. Let's call it "all.c".


#include "blocks.c"
#include "bstrlib.c"
#include "detab.c"
#include "html.c"
#include "inlines.c"
#include "main.c"
#include "print.c"
#include "scanners.c"
#include "utf8.c"

Unfortunately, it does not compile out of the box, because of the conflicting "advance" macros in inlines.c and utf8.c (fixable by undefining these macros at the end of each file), and because of the missing header guard around stmd.h (fixable trivially by adding it). With that, one can submit this all-inclusive file to the static analyzer:


scan-build clang -g -O3 -Wall -std=c99 -c -o src/all.o src/all.c

Result: no bugs found, except dead assignments.


Saturday, May 17, 2014

Antispam misconfigurations

Introduction

This blog post is about ensuring correct operation of one particular antispam solution. However, I think that the thoughts about possible misconfigurations expressed here apply to most of them.

The following combination of mail-related software is quite popular: Postfix + DSPAM + Dovecot. Each of these products comes with an extensive user manual, and packages are available for almost every linux distribution. So, I decided to use it for the company mail. In fact, Postfix and Dovecot were already installed (with all users being virtual), and it only remained to install DSPAM, because spam became a problem for some users.

Here is what kinds of non-spam messages go through our server: business mail (invoices, documents, commercial offers), technical support, discussions within the team, bugtracker tickets, automated notifications (e.g. when contracts are about to expire).

There are many manuals on setting up DSPAM together with Postfix and Dovecot. Below are the common things mentioned in them.

Postfix should pass the incoming mail into DSPAM. The preferred protocol for doing this is LMTP over a unix-domain socket. DSPAM should add X-DSPAM-* headers to them and reinject into Postfix. Then Postfix should contact Dovecot via LMTP, and then the message finally gets delivered to the user's mailbox (or the spam folder, with the help of a sieve filter). If DSPAM makes a mistake, the user can move the message appropriately via IMAP, and the dovecot-antispam plugin will train DSPAM about this incident.

So far so good. I installed DSPAM (with a simple hash driver backend) and configured the rest of mail-related software to use it. It even appeared to work for me after initial training. But then, we encountered problems, not explicitly mentioned in the manuals, described below. If you are reading this post, please test your mail servers for them, too.

Training did not work for some users

Some users, including myself, used their full e-mail (including the company domain) as their IMAP username, and some didn't include the domain part. Both setups worked for sending and receiving mail. However, in the initial configuration, the user's login was passed to dspam-train as-is:

antispam_dspam_args = --deliver=;--client;--user;%u

Result: for some users (those who didn't append the domain to their IMAP username), the retraining process looked for the hash file in /var/spool/dspam/data/local, while that hash file is always in /var/spool/dspam/data/ourdomain.ru. The fix is to spell the domain explicitly:

antispam_dspam_args = --deliver=;--client;--user;%n@ourdomain.ru

In fact, I think that any use of %u in Dovecot configuration is wrong if you have only one domain on the mail server.

Duplicate e-mail from monitoring scripts

Monitoring scripts send e-mail to root@ourdomain.ru from other hosts if something bad happens. However, after configuring DSPAM, each of such messages arrived twice to my mailbox. This happened because the "root" alias is expanded recursively (this is OK, as root is virtual and has nothing to do with uid 0). We want to archive all root mail for easy reference, as well as to deliver it to the actual sysadmins. The alias expansion happened twice: once before DSPAM and once after it. The solution is to disable it once. I disabled it before DSPAM:

smtp      inet  n       -       n       -       -       smtpd
  -o content_filter=lmtp:unix:/var/run/dspam/dspam.sock
  -o receive_override_options=no_address_mappings

However, this was a mistake.

Training still did not work for sales

The sales team complained that they were not able to train DSPAM so that the incoming commercial queries end up in their inbox, and not in the spam folder. Manual training didn't help, either. This appeared to be a variation of the first problem: wrong path to the hash file.

The sales team has a "sales" mail alias that expands to all of them. As such, due to the previous "fix", Postfix told DSPAM that the mail is addressed to sales@ourdomain.ru:

smtp      inet  n       -       n       -       -       smtpd
  -o content_filter=lmtp:unix:/var/run/dspam/dspam.sock
  -o receive_override_options=no_address_mappings

Thus, DSPAM placed the hash file in /var/spool/dspam/data/ourdomain.ru/sales, while the training process looked in /var/spool/dspam/data/ourdomain.ru/$person. The solution was to move the no_address_mappings option after DSPAM, i.e.  the reinjection service. This way, both DSPAM and the dovecot-antispam plugin see the expanded recepient addresses.

Some e-mail from new team members was marked as spam

A general expectation is that authenticated e-mail from one user to the other user on the same corporate mail server is not spam. However, the new team members (and even some old ones) misconfigured their e-mail clients to use port 25 (with STARTSSL and authentication) for outgoing e-mail. As such, all their outgoing e-mail was processed by DSPAM, because the only factor that decides whether to process the e-mail is the port. The solution was to educate everyone on the team to use port 587 for outgoing e-mail, which is not configured to process messages with DSPAM. Also it would have been nice to make authentication always fail on port 25, but I didn't do this yet.

Outgoing e-mail was sometimes marked as spam

The general expectation is that outgoing mail should never be marked as spam, even if it is spam. If you disagree, then please note that there is nobody to notice the problem, and nobody except root can retrain the spam filter in such case.

This is mostly a duplicate of the previous item, with an interesting twist. Namely, there are some web scripts and cron jobs that send mail to external users, and both connect to 127.0.0.1:25 without authentication. I solved this by splitting the default smtp line in master.cf into two: one for 127.0.0.1:smtp, and one for my external IP address. Spam filtering is enabled only for the second line.

Conclusion


It works! Or at least pretends to work. With so many pitfalls already seen, I cannot be sure.


Tuesday, December 18, 2012

Stupid MySQL Help and Parser

Just stumbled upon this:

mysql> help CAST
Name: 'CAST'
Description:
Syntax:
CAST(expr AS type)

The CAST() function takes a value of one type and produce a value of
another type, similar to CONVERT(). See the description of CONVERT()
for more information.

URL: http://dev.mysql.com/doc/refman/5.1/en/cast-functions.html

mysql> select CAST ('2012-01-01 12:00:00' AS DATETIME);
ERROR 1305 (42000): FUNCTION CAST does not exist
mysql> select CAST('2012-01-01 12:00:00' AS DATETIME);
+-----------------------------------------+
| CAST('2012-01-01 12:00:00' AS DATETIME) |
+-----------------------------------------+
| 2012-01-01 12:00:00                     |
+-----------------------------------------+
1 row in set (0,00 sec)

I.e. the help claims that CAST is a function, but if you call it with a stray space after the name, it does not exist.

Saturday, October 6, 2012

Monkey-patching bash functions

Disclaimer: the whole post is a big bad hack.

Suppose that there is a big regularly-updated library of bash functions, and your script sources it or is sourced by it. One of these functions is not exactly right for your purpose (e.g. it contains a bug or misses a feature), but fixing the bug or adding the feature there is not practical. This might happen if the file containing the function library is managed by a package manager, and your bugfix will be overwritten at the next package update.

A straightforward sledgehammer-like solution is to make a copy of the library, fix the bug there, and source or be sourced by your modified copy (thus losing all future updates). This is not good.

If the offending function is called by your script directly, then, of course, you can define a differently-named function that is otherwise-identical to the original one, but has the needed fix, directly in your script, and use it. However, this approach does not work (or, rather, requires you to duplicate the whole call chain) if your script calls the offending function only indirectly.

A possibly-better solution (that may or may not work) is to redefine the offending function in your script. Indeed, out of many possibly existing definitions for a function bash uses the last one it encountered. Here is an interactive example of such overloading:

$ barf() { echo "123" ; }
$ barf
123
$ barf() { echo "456" ; }
$ barf
456

So, now you know how to completely replace a function. But, what if only a small change is required? E.g., if one command is missing at the end? This is also solvable thanks to an introspection feature of bash. I am talking about the "type" builtin. Here is what it does when applied to a function:


$ type barf
barf is a function
barf ()
{
    echo "456"
}

So, you have one line of a meaningless header and then the full slightly-reformatted source of the function on standard output. Let's grab this into a variable:

$ def=$( type barf )

You can then post-process it. E.g., let's transform this into a definition of a function that does exactly the same but also prints "789" at the end. The easiest way to do that is to remove the first line (the header) and insert echo "789" before the last line. Uhm, it is not easy to remove a line from a variable in pure bash... no problem, we'll comment it out instead!

$ def="# $( type barf )"
$ echo "$def"
# barf is a function
barf ()
{
    echo "456"
}

And now remove the last character (a closing brace) and replace it with the correct code:

$ def="${def%\}}echo \"789\" ; }"
$ echo "$def"
# barf is a function
barf ()
{
    echo "456"
echo "789" ; }

All that remains is to feed the contents of this variable back to bash as a piece of code. That's what eval does:

$ eval "$def"

Now the function looks correct and does what is intended:

$ type barf
barf is a function
barf ()
{
    echo "456";
    echo "789"
}
$ barf
456
789

Monday, August 27, 2012

SafeDNS.com is now open for registration

The company I work for decided to establish its presence in USA. So, please welcome a new player in the international market of content filtering solutions: SafeDNS. Strictly speaking, we are still in beta and need YOU to help us kill the last bugs.

This service can be useful if you want to protect yourself or your kids against accidentally opening sites with unsuitable content. Or, to prevent your employees from wasting time at work on such things as social networking and videos. Or even to evade a bad filter set up by your ISP :)

We have more than 4 mln sites sorted into more than 50 categories, and it's you who decides what to block and what to let through. All you need is an e-mail address, a public static IPv4 address on your router and the ability to change DNS settings on your computers.

Register now (it's free!), read the guide, and help us improve the service by sending feedback.

Saturday, July 28, 2012

On the negative side of duties

Yesterday Lennart Poettering presented a talk with the title "The UI and the OS". The main idea of the talk was simplification of the architecture of the whole OS. On one of the slides, there was approximately this seemingly-obvious text: "The distributions' job is to distribute your software".

This phrase (while completely appropriate for the slide in question) has reminded me one of the first lessons on philosophy that I received from G. V. Boldygin in the Urals State University while studying there. On that lesson, we discussed the role of science (including theoretical science) in the society. The obvious idea immediately proposed: theoretical science leads to technological progress by telling what to do that nobody else did before. The surprising fact given to us as students was that there were significant practical technological advances (achieved by trial-and-error) even before theorecical science was formed. The conclusion was that it is almost always incorrect to refer to only one side of something's role of duties (i.e. to something that is done, as opposed to something that is prevented, or vice versa). Indeed, we were told that science not only provides technological progress directly, but also saves us from wasting money and time on certain "scientifically impossible" projects such as trying to build a perpetual motion machine.

So, let's apply this lesson to refute Lennart's phrase: distributions exist not only to distribute your software. Indeed, as illustrated by the Windows world, one doesn't need any help for this. The other side of their duties is that they reject bad (in their opinion) software by not distributing it, thus creating an anchor of trust for software they do provide to users.

Thursday, July 26, 2012

GUADEC

I am going to GUADEC 2012. In fact, I have already arrived, and will soon listen to a keynote: "The Tor Project: Anonymity online". Somewhat scary, given that I work for a company that implements a (completely opt-in) web content filtering solution.

Ironically, we have some common not-yet implemented desktop intergation requirements: the ability to force the use of a particular DNS server across all connections, or to say "Use this VPN or refuse to send any packet to the Internet".