Here’s some tips on ways to anonymize your Website logs. This is simply the process of taking users’ IP addresses in the logs and removing the end of it so that the users can’t be individually identified by their IP address.
What we will go over here
The following will be covered here:
- How to have NGINX and Apache not log the last octet (or two) of the IP address.
- How to still allow NGINX and Apache to log the full IP address on access denied and potential exploit attempts, so security software can monitor and block malicious users and scripts.
- How to remove the last octet of IP address in the access logs after the log is saved.
Warning
Modifying the logged IP address can cause various issues, such as security software not knowing which IP address to block during an attack, or lower the usefulness of the IP address in the logs.
While I will show how to log the full IP address in the logs when needed, it’s up to you to make sure this all still works with your setup. Proceed at your own risk.
Anonymizing will Limit Accuracy
While Anonymizing the IP address will help protect a user’s privacy, this obviously limits the accuracy of the location of your users. For example, removing the last octet of the IP address (“198.51.100.44” –> “198.51.100.0”) can make them appear to be from a different city/district/state/providence than they actually are. And removing the last two octets (“198.51.100.44” –> “198.51.0.0”) could put them in a different country.
This is not always the case. It depends on where the resulting Anonymized IP is compared to the actual one.
Let’s get to it.
NGINX
NGINX allows us to anonymize the IP address before the log entry is saved.
Anonymize NGINX
First, we need to have NGINX change the IP address to an anonymized one. This won’t save it (yet), but it will provide it for us to use below. In the http
section nginx.conf
add the following. It removes the last octet, or the last part, of the IP address. So “198.51.100.44” becomes “198.51.100.0”. It will also remove the last part of an IPv6 address too.
map $remote_addr $remote_addr_anon {
~(?P<ip>\d+\.\d+\.\d+)\. $ip.0;
~(?P<ip>[^:]+:[^:]+): $ip::;
# IP addresses to not anonymize (such as your server)
127.0.0.1 $remote_addr;
::1 $remote_addr;
#w.x.y.z $remote_addr;
#a::c:d::e:f $remote_addr;
default 0.0.0.0;
}
If you want to remove the last two octets, so “198.51.100.44” becomes “198.51.0.0”, then change the second line above to:
~(?P<ip>\d+\.\d+)\.\d+\. $ip.0.0;
If you are using http_x_forwarded_for
(with a reverse proxy) you can anonymize it here.
map $http_x_forwarded_for $http_x_forwarded_for_anon {
~(?P<ip>\d+\.\d+\.\d+)\. $ip.0;
~(?P<ip>[^:]+:[^:]+): $ip::;
# IP addresses to not anonymize (such as your server)
127.0.0.1 $remote_addr;
::1 $remote_addr;
#w.x.y.z $remote_addr;
#a::c:d::e:f $remote_addr;
default -;
}
In both of these sections, you can optionally set your server’s/reverse proxy’s w.x.y.z
(ipv4) and a::c:d::e:f
(ipv6) IP address(es) so they won’t be anonymized. You can also add any other IP addresses that you don’t want anonymized.
Next we need to set the log format to use the anonymized IP address instead of the read one.
Info
The log format handle will be set as anon_ip
. If you have a lot of entries using main
(NGINX’s default one), it may be better for you to set it as main
so you don’t have to update all of those access_log
lines. I simply used anon_ip
to make it obvious what it did.
log_format anon_ip '$remote_addr_anon - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for_anon"';
# Example for http_x_forwarded_for_anon
#log_format anon_ip '$http_x_forwarded_for_anon - $remote_user [$time_local] "$request" '
# '$status $body_bytes_sent "$http_referer" '
# '"$http_user_agent" "$http_x_forwarded_for_anon"';
Now set the log file to use the anon_ip
format.
access_log /var/log/nginx/path-to-log-access.log anon_ip;
Log Full IP on Specific Error Codes
Optionally, we can log the full IP address on specific error codes, such as access denied for a wrong password, so security software such as for fail2ban
can monitor your logs and act on the real IP address. And at the same time keeping the normal access requests anonymized.
Info
You can still anonymize the logs later, once the firewall has the info. There’s a script towards the bottom that can do that.
This will be done by tagging normal access attempts as $normal_access
and specific error codes as $record_full_ip
. NGINX allows us to set these from the $status
variable, which holds the connection’s status code (such as 2xx/4xx). And then we will set a log entry for each.
Add these to your http
section.
map $status $normal_access {
400 0;
401 0;
403 0;
#404 0;
405 0;
406 0;
410 0;
default 1;
}
map $status $record_full_ip {
400 1;
401 1;
403 1;
#404 1;
405 1;
406 1;
410 1;
default 0;
}
Feel free to modify/add according to your needs. Note that I’ve commented out the 404 errors, but you can put those back in, and any other error codes you want to log. Just make sure that each code will only match one area, or you’ll have duplicate entries in your logs.
Here’s a shorter version of the above that does the same thing. It uses the regular expression ~^[1235]
which matches any 1xx
, 2xx
, 3xx
, 5xx
code. (Use either this or the above. Obviously don’t use both.)
map $status $normal_access { # 1 for 1xx/2xx/3xx/5xx, 0 for 4xx.
~^[1235] 1;
404 1; # comment to record the full IP of 404 errors.
default 0;
}
map $status $record_full_ip { # 1 for 4xx, 0 for everything else.
~^[1235] 0;
404 0; # comment to record the full IP of 404 errors.
default 1;
}
Next we will need a second log_format
entry to record 4xx errors ($record_full_ip
). We will use the anon_ip
log_format above for normal access.
In the http
section add this, and the above anon_ip
log_format.
log_format real_ip '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
For the actual logging, we will have two log lines, one for $normal_access
and one for the $record_full_ip
. You can use the same file for both, or a different one if you prefer. These can be set in the http
and/or server
areas, and different hosts/sites/areas can have different log files.
For each access attempt, only one entry will recorded, whether normal or a specified error code entry.
# Anonymized IP Access Logs
access_log /var/log/nginx/path-to-log-access.log anon_ip if=$normal_access;
# Record real IP address on specified error codes
access_log /var/log/nginx/path-to-log-access.log real_ip if=$record_full_ip;
Full NGINX Example
When you’re done, you will have something like this.
http {
.........
##
# Anonymize the IP Address
##
map $remote_addr $remote_addr_anon {
~(?P<ip>\d+\.\d+\.\d+)\. $ip.0;
~(?P<ip>[^:]+:[^:]+): $ip::;
# IP addresses to not anonymize (such as your server)
127.0.0.1 $remote_addr;
::1 $remote_addr;
#x.x.x.x $remote_addr;
#a::c:d::e:f $remote_addr;
default 0.0.0.0;
}
# add $http_x_forwarded_for section if needed.
##
# Tag the Access as Normal or Record IP (Specified Error codes)
##
map $status $normal_access {
400 0;
401 0;
403 0;
#404 0;
405 0;
406 0;
410 0;
default 1;
}
map $status $record_full_ip {
400 1;
401 1;
403 1;
#404 1;
405 1;
406 1;
410 1;
default 0;
}
##
# Set the Logs
##
log_format anon_ip '$remote_addr_anon - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for_anon"';
log_format real_ip '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
##
# Do the Actual Logging ( Can be set in Server section(s) )
##
# Anonymized IP Access Logs
access_log /var/log/nginx/path-to-log-access.log anon_ip if=$normal_access;
# Record real IP address on specified errors codes
access_log /var/log/nginx/path-to-log-access.log real_ip if=$record_full_ip;
# NGINX Error Logs
error_log /var/log/nginx/error.log;
.......
}
Apache
Apache has many ways to make the IP addresses anonymized. Here’s a few:
mod_ipv6calc
mod_ipv6calc will anonymize all IP addresses in Apache, before writing to the log. It’s usually packaged as ipv6calc-mod_ipv6calc
or mod_ipv6calc
, depending on your distro.
Info
Does not appear to work if Apache is behind a reverse proxy (ie, if you are using %a
in your logs). See the next section about how to do this by using a pipe.
The config is stored as /etc/httpd/conf.d/ipv6calc.conf
or /etc/apache2/conf.d/ipv6calc.conf
. First, load the module by uncommenting
LoadModule ipv6calc_module modules/mod_ipv6calc.so
Then make sure it’s on
ipv6calcEnable on
Set the level of anonymizing you want. For example, this takes out quite a lot:
ipv6calcOption anonymize-preset anonymize-careful
Or if you have you GeoIP databases (or others), you can use this:
ipv6calcOption anonymize-preset keep-type-asn-cc
The anonymized IP address is set to IPV6CALC_CLIENT_IP_ANON
, so you’ll want to enable logging with something like this:
LogFormat "%{IPV6CALC_CLIENT_IP_ANON}e %{IPV6CALC_CLIENT_COUNTRYCODE}e %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" vhost:%v" combined_anon
CustomLog "logs/access_log" combined_anon
Be sure to turn off your main logging, and/or set logging in your VirtualHosts files. Test it out and restart Apache
httpd -t
systemctl restart httpd
You’ll know it’s been loaded when you see something like in your error_log
file
...[ipv6calc:notice] ... internal main library version: 1.0.0 API: 1.0.0 (shared)
...[ipv6calc:notice] ... internal database library version: 1.0.0 API: 1.0.0 (shared)
...[ipv6calc:notice] ... configured module actions: anonymize=ON countrycode=ON asn=ON registry=ON
...[ipv6calc:notice] ... default module debug level: 0x00000000 (0)
...[ipv6calc:notice] ... module cache: ON (default) limit=20 (default/minimum) statistics_interval=0 (default)
Anonymize by pipes
This is a different method that will anonymize the IP address by piping the log through ipv6loganon
.
LogFormat "%a %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" vhost:%v" combined
CustomLog "|/usr/bin/ipv6loganon --anonymize-careful -f -a /var/log/httpd/access_log" combined
This is very useful if Apache is being used as a reverse proxy, as mod_ipv6calc
doesn’t work in that case.
Log IP only on errors
Another way to make things simple yet keep things private is to only log the IP addresses of actual errors or access denied attempts. The following will keep a log of things accessed, but will only add the ip address for http status codes of 400, 401, 403, 405, 406, and 410.
LogFormat "%400,401,403,405,406,410a %400,401,403,405,406,410l %400,401,403,405,406,410u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" common
# Mark requests from the loop-back interface
SetEnvIf Remote_Addr "127\.0\.0\.1" dontlog
# Mark requests for the robots.txt file
SetEnvIf Request_URI "^/robots\.txt$" dontlog
# Log what remains
CustomLog logs/access_log common env=!dontlog
Anonymize on errors
You can combine mod_ipv6calc with logging IP only on errors to. That way it won’t show any IP address on normal requests, but will show an anonymized IP address on errors:
LogFormat "%400,401,403,405,406,410{IPV6CALC_CLIENT_IP_ANON}e %400,401,403,405,406,410{IPV6CALC_CLIENT_COUNTRYCODE}e %400,401,403,405,406,410u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" vhost:%v" combined_anon
CustomLog "logs/access_log" combined_anon
Here’s an example of what it would look like. The first is an access denied entry, the second is a normal request.
198.51.100.0 US username [12/Mar/2018...] "GET /denied-file HTTP/1.0" 403 ...
- - - [12/Mar/2018...] "GET /normal-access HTTP/1.0" 200 ...
Anonymize old logs
If you have old logs that have the full IP address, you’ll want to remove those. One simple program is ipv6loganon
, part of the
ipv6calc
package.. It reads in a file, and anonymizes the IP addresses.
cat /var/log/httpd/access_log
198.51.100.53 - - [01/Jan/2007...] "GET /Linux+IPv6-HOWTO/x1112.html HTTP/1.0" 200 ...
fxyz::5abz:jazz:1:216:17ff:fe01:2345 - - [10/Jan/2007...] "GET /favicon.ico HTTP/1.1" 200 ...
cat /var/log/httpd/access_log | ipv6loganon
198.51.100.0 - - [01/Jan/2007...] "GET /Linux+IPv6-HOWTO/x1112.html HTTP/1.0" 200 ...
fxyz::5abz:jazz:216:17ff:fe00:0 - - [10/Jan/2007...] "GET /favicon.ico HTTP/1.1" 200 ...
It has multiple levels on anonymizing, such as --anonymize-careful
. See man ipv6loganon
for more info.
Script to anonymize multiple files
If you have several log files, and don’t want to have to go through them one by one, you can use the following script to do them all. (It also sets the log’s modified date to what it was previously).
anonymize-logs /path/to/logfile /path/to/logfile2 /path/to/folder/*.logs
It’s just a quick script, so make sure you have a backup of your logs before using. It intuitively doesn’t run ipv6loganon
on files that don’t have the IP address first, such as Apache error logs, but will run a regex on the file looking for and replacing IP addresses.
#!/bin/bash
# (c) Matt Bagley, under the GPL2
# given log file(s), it will anonymize the logs and update (only if needed)
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
if [ "$1" == "" ] || [ "$1" == "-h" ] ; then
echo "Usage: $0 logfile.log [logfile2.log] [logfile3.log] ..."
exit 1
fi
td="$(mktemp -d)"
temp_log=$td/file
temp_log2=${temp_log}-pass2
temp_log3=${temp_log}-pass3
clean_up() {
rm -f $temp_log $temp_log2 $temp_log3
rmdir $td
exit
}
trap "clean_up" 1 2 3 4 5 15
for each in $@ ; do
#echo Looking at $each
# does it exist?
if ! [ -f $each ] ; then
echo "File not found: $each"
continue
fi
# non-zero?
if ! [ -s $each ] ; then
continue
fi
# compressed or not?
compressed=0
if [ -n "$(echo $each | grep '.gz$')" ] ; then
compressed=1
fi
# expand log
if [ $compressed -eq 1 ] ; then
zcat $each > $temp_log
else
cat $each > $temp_log
fi
# make sure that none of the lines start with '-'. ipv6loganon does not like this
# and that no lines have "::: " in them
cat $temp_log | sed 's/^- /0.0.0.0 /g' | sed 's/:::* /:: /g' > $temp_log2
# anonymize it (ipv6loganon only does files that have IP address first)
if [ -n "$(head -n 10 $temp_log2 | awk '{print $1}' | egrep '(\.|:)')" ] \
&& [ -z "$(head -n 10 $temp_log2 | awk '{print $1}' | sed 's/[a-fA-F0-9\.:]*//g')" ] ; then
# echo Running ipv6loganon on $each
cat $temp_log2 | ipv6loganon --anonymize-careful > $temp_log3
cat $temp_log3 > $temp_log2
rm -f $temp_log3
fi
cat $temp_log2 | sed 's/\([0-9]*\.[0-9]*\.[0-9]*\)\.[0-9]*/\1.0/g' \
| sed 's/\([0-9a-fA-F]*:[0-9a-fA-F]*:[0-9a-fA-F]*:\)[0-9a-fA-F:]*/\1:/g' \
| sed 's/:::*/::/g' > $temp_log3
cat $temp_log3 > $temp_log2
rm -f $temp_log3
# verify that it's not empty
if ! [ -s $temp_log2 ] ; then
echo "$each was processed as empty"
continue
fi
# diff to see if we changed anything
if [ -n "$(diff -q $temp_log $temp_log2)" ] ; then
# if we did, zip and copy file back
temp_log_ext=""
if [ $compressed -eq 1 ] ; then
gzip $temp_log2
temp_log_ext=.gz
fi
mv $each ${each}-old
echo Replacing $each
cat ${temp_log2}${temp_log_ext} > $each
# set the time to the same as the previous file
touch --reference=${each}-old $each
# clean up
rm -f ${each}-old ${temp_log2}${temp_log_ext}
fi
rm $temp_log $temp_log2 $temp_log3 -f
done
clean_up
Then add a weekly cron job for it: /etc/cron.weekly/anonymize-logs
#!/bin/bash
/usr/local/bin/anonymize-logs2 /var/log/httpd/*.gz /var/log/nginx/*.gz /var/log/maillog*.gz
The nice thing about this is that it only anonymizes the log once it’s been archived to a .gz file, so it shouldn’t cause any trouble to security systems that need to know the real IP address, as they only really work with the current log file.
System Logs
Some log files, such as error logs for Apache/NGINX, have IP addresses that get stored in them, and don’t have a built in way to anonymize them. You can use the above script for these files too.
anonymize-logs /var/log/secure*.gz /var/log/maillog*.gz
It’s probably best to not anonymize /var/log/secure and other current ones if you have a firewall or other program that needs the actual IP addresses in order to function.
Bonus: Limit tracking on analytic software
You can also help protect the privacy of your users by using these methods. Note that this may also limit the accuracy of the location of your users.
Matomo
Matomo has an easy builtin way to enable anonymize IP addresses. This can be done either when you first setup, or later on. I have this enabled on my setup and it works like a charm. Go to Administration
> Privacy
> Anonymize
in your dashboard to enable it. See
here for more info.
Google Analytics
This can be added to your Analytics script as a variable, shown below. See this page for more info.
ga('set', 'anonymizeIp', true);
Conclusion
As you can see, it’s simple and easy not to store the full IP addresses. Go ahead and set this up if you haven’t already.
Updated Jan 2021: Fixed where 4xx error entries would be recorded twice. Also made it easier to read, and the examples more straightforward.