Here’s some tips on ways to anonymize your Website logs. This is simply the process of taking users’ IP addresses in the logs and removing the end of it so that the users can’t be individually identified by their IP address.

What we will go over here

The following will be covered here:

  • How to have NGINX and Apache not log the last octet (or two) of the IP address.
  • How to still allow NGINX and Apache to log the full IP address on access denied and potential exploit attempts, so security software can monitor and block malicious users and scripts.
  • How to remove the last octet of IP address in the access logs after the log is saved.

Warning

Modifying the logged IP address can cause various issues, such as security software not knowing which IP address to block during an attack, or lower the usefulness of the IP address in the logs.

While I will show how to log the full IP address in the logs when needed, it’s up to you to make sure this all still works with your setup. Proceed at your own risk.

Anonymizing will Limit Accuracy

While Anonymizing the IP address will help protect a user’s privacy, this obviously limits the accuracy of the location of your users. For example, removing the last octet of the IP address (“198.51.100.44” –> “198.51.100.0”) can make them appear to be from a different city/district/state/providence than they actually are. And removing the last two octets (“198.51.100.44” –> “198.51.0.0”) could put them in a different country.

This is not always the case. It depends on where the resulting Anonymized IP is compared to the actual one.

Let’s get to it.

NGINX

NGINX allows us to anonymize the IP address before the log entry is saved.

Anonymize NGINX

First, we need to have NGINX change the IP address to an anonymized one. This won’t save it (yet), but it will provide it for us to use below. In the http section nginx.conf add the following. It removes the last octet, or the last part, of the IP address. So “198.51.100.44” becomes “198.51.100.0”. It will also remove the last part of an IPv6 address too.

        map $remote_addr $remote_addr_anon {
                ~(?P<ip>\d+\.\d+\.\d+)\.    $ip.0;
                ~(?P<ip>[^:]+:[^:]+):       $ip::;
                # IP addresses to not anonymize (such as your server)
                127.0.0.1                   $remote_addr;
                ::1                         $remote_addr;
                #w.x.y.z                     $remote_addr;
                #a::c:d::e:f                $remote_addr;
                default                     0.0.0.0;
        }

If you want to remove the last two octets, so “198.51.100.44” becomes “198.51.0.0”, then change the second line above to:

                ~(?P<ip>\d+\.\d+)\.\d+\.    $ip.0.0;

If you are using http_x_forwarded_for (with a reverse proxy) you can anonymize it here.

        map $http_x_forwarded_for $http_x_forwarded_for_anon {
                ~(?P<ip>\d+\.\d+\.\d+)\.    $ip.0;
                ~(?P<ip>[^:]+:[^:]+):       $ip::;
                # IP addresses to not anonymize (such as your server)
                127.0.0.1                   $remote_addr;
                ::1                         $remote_addr;
                #w.x.y.z                     $remote_addr;
                #a::c:d::e:f                $remote_addr;
                default                     -;
        }

In both of these sections, you can optionally set your server’s/reverse proxy’s w.x.y.z (ipv4) and a::c:d::e:f (ipv6) IP address(es) so they won’t be anonymized. You can also add any other IP addresses that you don’t want anonymized.

Next we need to set the log format to use the anonymized IP address instead of the read one.

Info

The log format handle will be set as anon_ip. If you have a lot of entries using main (NGINX’s default one), it may be better for you to set it as main so you don’t have to update all of those access_log lines. I simply used anon_ip to make it obvious what it did.

        log_format  anon_ip   '$remote_addr_anon - $remote_user [$time_local] "$request" '
                              '$status $body_bytes_sent "$http_referer" '
                              '"$http_user_agent" "$http_x_forwarded_for_anon"';

        # Example for http_x_forwarded_for_anon
        #log_format  anon_ip   '$http_x_forwarded_for_anon - $remote_user [$time_local] "$request" '
        #                      '$status $body_bytes_sent "$http_referer" '
        #                      '"$http_user_agent" "$http_x_forwarded_for_anon"';

Now set the log file to use the anon_ip format.

        access_log  /var/log/nginx/path-to-log-access.log  anon_ip;

Log Full IP on Specific Error Codes

Optionally, we can log the full IP address on specific error codes, such as access denied for a wrong password, so security software such as for fail2ban can monitor your logs and act on the real IP address. And at the same time keeping the normal access requests anonymized.

Info

You can still anonymize the logs later, once the firewall has the info. There’s a script towards the bottom that can do that.

This will be done by tagging normal access attempts as $normal_access and specific error codes as $record_full_ip. NGINX allows us to set these from the $status variable, which holds the connection’s status code (such as 2xx/4xx). And then we will set a log entry for each.

Add these to your http section.

        map $status $normal_access {
                400      0;
                401      0;
                403      0;
                #404      0;
                405      0;
                406      0;
                410      0;
                default 1;
        }
        map $status $record_full_ip {
                400      1;
                401      1;
                403      1;
                #404      1;
                405      1;
                406      1;
                410      1;
                default 0;
        }

Feel free to modify/add according to your needs. Note that I’ve commented out the 404 errors, but you can put those back in, and any other error codes you want to log. Just make sure that each code will only match one area, or you’ll have duplicate entries in your logs.

Here’s a shorter version of the above that does the same thing. It uses the regular expression ~^[1235] which matches any 1xx, 2xx, 3xx, 5xx code. (Use either this or the above. Obviously don’t use both.)

        map $status $normal_access {   # 1 for 1xx/2xx/3xx/5xx, 0 for 4xx.
                ~^[1235]  1;
                404       1; # comment to record the full IP of 404 errors.
                default   0;
        }
        map $status $record_full_ip {    # 1 for 4xx, 0 for everything else.
                ~^[1235]  0;
                404       0; # comment to record the full IP of 404 errors.
                default   1;
        }

Next we will need a second log_format entry to record 4xx errors ($record_full_ip). We will use the anon_ip log_format above for normal access.

In the http section add this, and the above anon_ip log_format.

        log_format  real_ip  '$remote_addr - $remote_user [$time_local] "$request" '
                             '$status $body_bytes_sent "$http_referer" '
                             '"$http_user_agent" "$http_x_forwarded_for"';

For the actual logging, we will have two log lines, one for $normal_access and one for the $record_full_ip. You can use the same file for both, or a different one if you prefer. These can be set in the http and/or server areas, and different hosts/sites/areas can have different log files.

For each access attempt, only one entry will recorded, whether normal or a specified error code entry.

        # Anonymized IP Access Logs
        access_log /var/log/nginx/path-to-log-access.log  anon_ip if=$normal_access;

        # Record real IP address on specified error codes
        access_log /var/log/nginx/path-to-log-access.log  real_ip if=$record_full_ip;

Full NGINX Example

When you’re done, you will have something like this.

http {

        .........

        ##
        # Anonymize the IP Address
        ##

        map $remote_addr $remote_addr_anon {
                ~(?P<ip>\d+\.\d+\.\d+)\.    $ip.0;
                ~(?P<ip>[^:]+:[^:]+):       $ip::;
                # IP addresses to not anonymize (such as your server)
                127.0.0.1                   $remote_addr;
                ::1                         $remote_addr;
                #x.x.x.x                     $remote_addr;
                #a::c:d::e:f               $remote_addr;
                default                     0.0.0.0;
        }
        # add $http_x_forwarded_for section if needed.

        ##
        # Tag the Access as Normal or Record IP (Specified Error codes)
        ##

        map $status $normal_access {
                400      0;
                401      0;
                403      0;
                #404      0;
                405      0;
                406      0;
                410      0;
                default  1;
        }

        map $status $record_full_ip {
                400      1;
                401      1;
                403      1;
                #404      1;
                405      1;
                406      1;
                410      1;
                default  0;
        }

        ##
        # Set the Logs
        ##

        log_format  anon_ip  '$remote_addr_anon - $remote_user [$time_local] "$request" '
                             '$status $body_bytes_sent "$http_referer" '
                             '"$http_user_agent" "$http_x_forwarded_for_anon"';

        log_format  real_ip  '$remote_addr - $remote_user [$time_local] "$request" '
                             '$status $body_bytes_sent "$http_referer" '
                             '"$http_user_agent" "$http_x_forwarded_for"';

        ##
        # Do the Actual Logging ( Can be set in Server section(s) )
        ##

        # Anonymized IP Access Logs
        access_log /var/log/nginx/path-to-log-access.log  anon_ip if=$normal_access;

        # Record real IP address on specified errors codes
        access_log /var/log/nginx/path-to-log-access.log  real_ip if=$record_full_ip;

  # NGINX Error Logs
  error_log /var/log/nginx/error.log;

        .......

}

Apache

Apache has many ways to make the IP addresses anonymized. Here’s a few:

mod_ipv6calc

mod_ipv6calc will anonymize all IP addresses in Apache, before writing to the log. It’s usually packaged as ipv6calc-mod_ipv6calc or mod_ipv6calc, depending on your distro.

Info

Does not appear to work if Apache is behind a reverse proxy (ie, if you are using %a in your logs). See the next section about how to do this by using a pipe.

The config is stored as /etc/httpd/conf.d/ipv6calc.conf or /etc/apache2/conf.d/ipv6calc.conf. First, load the module by uncommenting

LoadModule ipv6calc_module modules/mod_ipv6calc.so

Then make sure it’s on

ipv6calcEnable on

Set the level of anonymizing you want. For example, this takes out quite a lot:

ipv6calcOption anonymize-preset anonymize-careful

Or if you have you GeoIP databases (or others), you can use this:

ipv6calcOption anonymize-preset keep-type-asn-cc

The anonymized IP address is set to IPV6CALC_CLIENT_IP_ANON, so you’ll want to enable logging with something like this:

LogFormat "%{IPV6CALC_CLIENT_IP_ANON}e %{IPV6CALC_CLIENT_COUNTRYCODE}e %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"  vhost:%v" combined_anon
CustomLog "logs/access_log" combined_anon

Be sure to turn off your main logging, and/or set logging in your VirtualHosts files. Test it out and restart Apache

httpd -t
systemctl restart httpd

You’ll know it’s been loaded when you see something like in your error_log file

...[ipv6calc:notice] ... internal main     library version: 1.0.0  API: 1.0.0  (shared)                                            
...[ipv6calc:notice] ... internal database library version: 1.0.0  API: 1.0.0  (shared)                                            
...[ipv6calc:notice] ... configured module actions: anonymize=ON countrycode=ON asn=ON registry=ON        
...[ipv6calc:notice] ... default module debug level: 0x00000000 (0)                                                                
...[ipv6calc:notice] ... module cache: ON (default)  limit=20 (default/minimum)  statistics_interval=0 (default)

Anonymize by pipes

This is a different method that will anonymize the IP address by piping the log through ipv6loganon.

LogFormat "%a %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" vhost:%v" combined
CustomLog "|/usr/bin/ipv6loganon --anonymize-careful -f -a /var/log/httpd/access_log" combined

This is very useful if Apache is being used as a reverse proxy, as mod_ipv6calc doesn’t work in that case.

Log IP only on errors

Another way to make things simple yet keep things private is to only log the IP addresses of actual errors or access denied attempts. The following will keep a log of things accessed, but will only add the ip address for http status codes of 400, 401, 403, 405, 406, and 410.

LogFormat "%400,401,403,405,406,410a %400,401,403,405,406,410l %400,401,403,405,406,410u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" common
# Mark requests from the loop-back interface
SetEnvIf Remote_Addr "127\.0\.0\.1" dontlog
# Mark requests for the robots.txt file
SetEnvIf Request_URI "^/robots\.txt$" dontlog
# Log what remains
CustomLog logs/access_log common env=!dontlog

Anonymize on errors

You can combine mod_ipv6calc with logging IP only on errors to. That way it won’t show any IP address on normal requests, but will show an anonymized IP address on errors:

LogFormat "%400,401,403,405,406,410{IPV6CALC_CLIENT_IP_ANON}e %400,401,403,405,406,410{IPV6CALC_CLIENT_COUNTRYCODE}e %400,401,403,405,406,410u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"  vhost:%v" combined_anon
CustomLog "logs/access_log" combined_anon

Here’s an example of what it would look like. The first is an access denied entry, the second is a normal request.

198.51.100.0 US username [12/Mar/2018...] "GET /denied-file HTTP/1.0" 403 ...
- - - [12/Mar/2018...] "GET /normal-access HTTP/1.0" 200 ...

Anonymize old logs

If you have old logs that have the full IP address, you’ll want to remove those. One simple program is ipv6loganon, part of the ipv6calc package.. It reads in a file, and anonymizes the IP addresses.

cat /var/log/httpd/access_log
198.51.100.53 - - [01/Jan/2007...] "GET /Linux+IPv6-HOWTO/x1112.html HTTP/1.0" 200 ...
fxyz::5abz:jazz:1:216:17ff:fe01:2345 - - [10/Jan/2007...] "GET /favicon.ico HTTP/1.1" 200 ...
cat /var/log/httpd/access_log | ipv6loganon
198.51.100.0 - - [01/Jan/2007...] "GET /Linux+IPv6-HOWTO/x1112.html HTTP/1.0" 200 ...
fxyz::5abz:jazz:216:17ff:fe00:0 - - [10/Jan/2007...] "GET /favicon.ico HTTP/1.1" 200 ...

It has multiple levels on anonymizing, such as --anonymize-careful. See man ipv6loganon for more info.

Script to anonymize multiple files

If you have several log files, and don’t want to have to go through them one by one, you can use the following script to do them all. (It also sets the log’s modified date to what it was previously).

anonymize-logs /path/to/logfile /path/to/logfile2 /path/to/folder/*.logs

It’s just a quick script, so make sure you have a backup of your logs before using. It intuitively doesn’t run ipv6loganon on files that don’t have the IP address first, such as Apache error logs, but will run a regex on the file looking for and replacing IP addresses.

#!/bin/bash
# (c) Matt Bagley, under the GPL2
# given log file(s), it will anonymize the logs and update (only if needed)

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin

if [ "$1" == "" ] || [ "$1" == "-h" ] ; then
        echo "Usage: $0 logfile.log [logfile2.log] [logfile3.log] ..."
        exit 1
fi

td="$(mktemp -d)"
temp_log=$td/file
temp_log2=${temp_log}-pass2
temp_log3=${temp_log}-pass3

clean_up() {
        rm -f $temp_log $temp_log2 $temp_log3
        rmdir $td
        exit
}
trap "clean_up"  1 2 3 4 5 15

for each in $@ ; do
        #echo Looking at $each
        # does it exist?
        if ! [ -f $each ] ; then
                echo "File not found: $each"
                continue
        fi
        # non-zero?
        if ! [ -s $each ] ; then
                continue
        fi
        # compressed or not?
        compressed=0
        if [ -n "$(echo $each | grep '.gz$')" ] ; then
                compressed=1
        fi
        # expand log
        if [ $compressed -eq 1 ] ; then
                zcat $each > $temp_log
        else
                cat $each > $temp_log
        fi

        # make sure that none of the lines start with '-'. ipv6loganon does not like this
        # and that no lines have "::: " in them
        cat $temp_log | sed 's/^- /0.0.0.0 /g' | sed 's/:::* /:: /g' > $temp_log2

        # anonymize it (ipv6loganon only does files that have IP address first)
        if [ -n "$(head -n 10 $temp_log2 | awk '{print $1}' | egrep '(\.|:)')" ] \
        &&  [ -z "$(head -n 10 $temp_log2 | awk '{print $1}' | sed 's/[a-fA-F0-9\.:]*//g')" ] ; then
        #       echo Running ipv6loganon on $each
                cat $temp_log2 | ipv6loganon --anonymize-careful > $temp_log3
                cat $temp_log3 > $temp_log2
                rm -f $temp_log3
        fi
        cat $temp_log2 | sed 's/\([0-9]*\.[0-9]*\.[0-9]*\)\.[0-9]*/\1.0/g' \
                  | sed 's/\([0-9a-fA-F]*:[0-9a-fA-F]*:[0-9a-fA-F]*:\)[0-9a-fA-F:]*/\1:/g' \
                  | sed 's/:::*/::/g' > $temp_log3
        cat $temp_log3 > $temp_log2
        rm -f $temp_log3

        # verify that it's not empty
        if ! [ -s $temp_log2 ] ; then
                echo "$each was processed as empty"
                continue
        fi
        # diff to see if we changed anything
        if [ -n "$(diff -q $temp_log $temp_log2)" ] ; then
                # if we did, zip and copy file back
                temp_log_ext=""
                if [ $compressed -eq 1 ] ; then
                        gzip $temp_log2
                         temp_log_ext=.gz
                fi
                mv $each ${each}-old
                echo Replacing $each
                cat ${temp_log2}${temp_log_ext} > $each
                # set the time to the same as the previous file
                touch --reference=${each}-old $each
                # clean up
                rm -f ${each}-old ${temp_log2}${temp_log_ext}
        fi
        rm $temp_log $temp_log2 $temp_log3 -f
done

clean_up

Then add a weekly cron job for it: /etc/cron.weekly/anonymize-logs

#!/bin/bash

/usr/local/bin/anonymize-logs2 /var/log/httpd/*.gz /var/log/nginx/*.gz /var/log/maillog*.gz

The nice thing about this is that it only anonymizes the log once it’s been archived to a .gz file, so it shouldn’t cause any trouble to security systems that need to know the real IP address, as they only really work with the current log file.

System Logs

Some log files, such as error logs for Apache/NGINX, have IP addresses that get stored in them, and don’t have a built in way to anonymize them. You can use the above script for these files too.

anonymize-logs /var/log/secure*.gz /var/log/maillog*.gz

It’s probably best to not anonymize /var/log/secure and other current ones if you have a firewall or other program that needs the actual IP addresses in order to function.

Bonus: Limit tracking on analytic software

You can also help protect the privacy of your users by using these methods. Note that this may also limit the accuracy of the location of your users.

Matomo

Matomo has an easy builtin way to enable anonymize IP addresses. This can be done either when you first setup, or later on. I have this enabled on my setup and it works like a charm. Go to Administration > Privacy > Anonymize in your dashboard to enable it. See here for more info.

Google Analytics

This can be added to your Analytics script as a variable, shown below. See this page for more info.

ga('set', 'anonymizeIp', true);

Conclusion

As you can see, it’s simple and easy not to store the full IP addresses. Go ahead and set this up if you haven’t already.

Updated Jan 2021: Fixed where 4xx error entries would be recorded twice. Also made it easier to read, and the examples more straightforward.