Syslog Description, Handling and Scripts

Spread the love

Logs Analysis

The following system and application logs given in example come from different systems, along with some scripts useful to analyze them and create meaningful information.

/var/log/syslog

Description

The OS I chose to analyze this log from is available on Amazon AWS: Ubuntu 16.04.3 LTS (Xenial Xerus).

This log contains absolutely every system log messages generated on the system, because of this configuration directive: *.*

*.*;auth,authpriv.none          -/var/log/syslog

It contains global system messages from any level, not including some messages that are logged during system startup. Startup messages are accessed with the dmesg utility. You can notice some redundancy since *.* is used on the same line as auth and authpriv.none.

 

Excerpt

Mar 27 08:17:01 localhost CRON[20321]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Mar 27 08:25:01 localhost CRON[20815]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Mar 27 08:35:01 localhost CRON[21431]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Mar 27 08:36:32 localhost dhclient[804]: DHCPREQUEST of 172.31.81.250 on eth0 to 172.31.80.1 port 67 (xid=0x70bcd9fd)
Mar 27 08:36:32 localhost dhclient[804]: DHCPACK of 172.31.81.250 from 172.31.80.1
Mar 27 08:36:32 localhost dhclient[804]: bound to 172.31.81.250 -- renewal in 1629 seconds.
Mar 27 08:39:01 localhost CRON[21690]: (root) CMD ( [ -x /usr/lib/php/sessionclean ] && /usr/lib/php/sessionclean)
Mar 27 08:39:07 localhost systemd[1]: Started Session 10084 of user ubuntu.
Mar 27 08:45:01 localhost CRON[22576]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Mar 27 08:50:51 localhost ubuntu: Hello World
Mar 27 08:51:02 localhost scriptname: Hello World
Mar 27 08:55:01 localhost CRON[23259]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Mar 27 09:00:00 localhost prometheus[928]: level=info ts=2018-03-27T13:00:00.360708025Z caller=compact.go:387 component=tsdbmsg="compact blocks" count=1 mint=1522144800000 maxt=1522152000000
Mar 27 09:00:00 localhost prometheus[928]: level=info ts=2018-03-27T13:00:00.458691022Z caller=head.go:348 component=tsdbmsg="head GC completed" duration=3.232507ms
Mar 27 09:00:00 localhost prometheus[928]: level=info ts=2018-03-27T13:00:00.47021462Z caller=head.go:357 component=tsdbmsg="WAL truncation completed" duration=11.184782ms
Mar 27 09:00:00 localhost prometheus[928]: level=info ts=2018-03-27T13:00:00.492133316Z caller=compact.go:387 component=tsdbmsg="compact blocks" count=3 mint=1522130400000 maxt=1522152000000
Mar 27 09:00:00 localhost prometheus[928]: level=info ts=2018-03-27T13:00:00.641760617Z caller=compact.go:387 component=tsdbmsg="compact blocks" count=3 mint=1522087200000 maxt=1522152000000
Mar 27 09:03:41 localhost dhclient[804]: DHCPREQUEST of 172.31.81.250 on eth0 to 172.31.80.1 port 67 (xid=0x70bcd9fd)
Mar 27 09:03:41 localhost dhclient[804]: DHCPACK of 172.31.81.250 from 172.31.80.1
Mar 27 09:03:41 localhost dhclient[804]: bound to 172.31.81.250 -- renewal in 1577 seconds.
Mar 27 09:05:01 localhost CRON[23918]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Mar 27 09:09:01 localhost CRON[24166]: (root) CMD ( [ -x /usr/lib/php/sessionclean ] && /usr/lib/php/sessionclean)

 

 

/var/log/auth.log

Description

This log should contains all the messages from authentication facilities (ftp, telnet, ssh, etc) whose level is defined in the syslogd.conf.

This particular log excerpt comes from a router flashed with DD-WRT firmware,which is open to the world, without a firewall.

 

Excerpt

messages.10:Mar 8 08:21:46 ddwrt authpriv.info dropbear[24452]: Child connection from 123.249.79.232:2792
messages.10:Mar 8 08:21:48 ddwrtauthpriv.warndropbear[24452]: Bad password attempt for 'root' from 123.249.79.232:2792
messages.10:Mar 8 08:21:49 ddwrtauthpriv.warndropbear[24452]: Bad password attempt for 'root' from 123.249.79.232:2792
messages.10:Mar 8 08:21:50 ddwrtauthpriv.warndropbear[24452]: Bad password attempt for 'root' from 123.249.79.232:2792
messages.10:Mar 8 08:21:50 ddwrt authpriv.info dropbear[24452]: Exit before auth (user 'root', 3 fails): Max auth tries reached - user 'root' from 123.249.79.232:2792
messages.10:Mar 8 08:22:48 ddwrt authpriv.info dropbear[24478]: Child connection from 123.249.79.232:2675
messages.10:Mar 8 08:22:50 ddwrtauthpriv.warndropbear[24478]: Bad password attempt for 'root' from 123.249.79.232:2675
messages.10:Mar 8 08:22:51 ddwrtauthpriv.warndropbear[24478]: Bad password attempt for 'root' from 123.249.79.232:2675
messages.10:Mar 8 08:22:51 ddwrt authpriv.info dropbear[24478]: Exit before auth (user 'root', 2 fails): Error reading: Connection reset by peer
messages.10:Mar 8 08:23:48 ddwrt authpriv.info dropbear[24479]: Child connection from 123.249.79.232:2818
messages.10:Mar 8 08:23:50 ddwrtauthpriv.warndropbear[24479]: Login attempt for nonexistent user from 123.249.79.232:2818
messages.10:Mar 8 08:23:51 ddwrtauthpriv.warndropbear[24479]: Login attempt for nonexistent user from 123.249.79.232:2818
messages.10:Mar 8 08:23:51 ddwrtauthpriv.warndropbear[24479]: Login attempt for nonexistent user from 123.249.79.232:2818
messages.10:Mar 8 08:23:52 ddwrt authpriv.info dropbear[24479]: Exit before auth: Max auth tries reached - user 'is invalid' from 123.249.79.232:2818
messages.10:Mar 8 08:24:49 ddwrt authpriv.info dropbear[24505]: Child connection from 123.249.79.232:2631

 

Script for auth.log

egrep "Bad password|nonexistent" /var/log/auth.log |\
  awk '{print $NF}' |\
  cut -f1 -d: |\
  sort |\
  egrep -v "^10.0|^172.16|^192.168" |\
  uniq >/tmp/listhackers.log

while read ip; do
 # tr -c takes complement of pattern
 # tr -d delete characters
 # Thistr command removes all the characters other than the ASCII octal values that are shown between the single quotes.
 # These octal values represent characters we want to keep:
 # octal 11: tab
 # octal 12: linefeed
 # octal 15: carriage return
 # octal 40 through octal 176: printable characters
 cleanIp=$(echo $ip | tr -cd '\11\12\15\40-\176\;')
 if [[ $cleanIp =~ ([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3} ]]; then
   whois -H $cleanIp |\
   egrep -im 1 "descr:|country|g. \[Organization\]" |\
   sed -e "s/descr:/$cleanIp\t/gi" -e "s/Country:/$cleanIp\t/gi" -e "s/g. \[Organization\]/$cleanIp\t/gi"
 else
   echo -ne "$cleanIp\t\t\c"
   grep -m1 "$ip" /var/log/auth.log
 fi
done </tmp/listhackers.log

 

This script will extract from the auth.log and all the archived, gziped logs, all the IP addresses from failed login tentative (bad password or nonexistent user), list them with an origin indicator (Address, City, or Country). It will print the whole line from the log if the whois command fails for any reason (bad IP recorded because of unicode characters, mostly).

Applies on:

/var/log/auth.log
/var/log/auth.log.1
/var/log/auth.log.2.gz
/var/log/auth.log.*.gz

Example output:

023.249.79.222 /var/log/messages.10:Mar 8 10:25:45 ddwrtauthpriv.warndropbear[26316]: Login attempt for nonexistent user frol 12▒.249.79.232:3023Mar 8 10:25:4!ddwrt authprivinfodro`bear[26▒16_: Exit before auth: Max auth tries reached - user 'is invalid' from 023.249.79.222:3021
107.170.192.104 US
123.24.79.232 Vietnam Posts and Telecommunications Group
123.249.78232 123.249.79.232 CN
133.249.59.232 SAKURA KCS Corporation
136.160.141.4 US
136.160.156.59 US
151.80.47.122 OVH SAS
178.218.96.4 RU
180.101.193.29 Chinanet Jiangsu Province Network
185.143.223.121 NL
185.143.223.136 NL
92.63.197.29 UA

 

Explanations on the script

  • We exclude from the first grep all the private local network IPs from IANA rules (“^10.0|^172.16|^192.168“)
  • Experience shows that script kiddies sometimes send invisible characters or use unicode to login, resulting in invisible characters being logged. This causes problems while processing the IP collected so extra care is taken by cleaning the IPs with the tr command.

Problematic log line:

Mar 8 10:25:45 (...) Login attempt for nonexistent user frol 12▒.249.79.232:3023

resulting problematic IP in the /tmp/listhackers.log:

123.24^Y.79.232
  • whois command will return different formats depending on the IANA region from the IP address. Also, the fields are optional so this is why the script tries to grep different patterns: “descr:|country|g. \[Organization\]“.

 

Example whois output from Japan (APNIC):

Network Information:
a. [Network Number] 133.249.0.0/16
b. [Network Name] JP-KCS-NET
g. [Organization] SAKURA KCS Corporation
m. [Administrative Contact] KI13202JP
n. [Technical Contact] KI13202JP
p. [Nameserver] ns01.sakura-utopia.jp
p. [Nameserver] ns02.sakura-utopia.jp
p. [Nameserver] ns03.sakura-utopia.jp
[Assigned Date] 1990/10/30

 

Example whois output from Ukraine (RIPE):

inetnum: 92.63.197.0 - 92.63.197.255
netname: NVFOPServer
country: UA
admin-c: ACRO8769-RIPE
org: ORG-FHVA2-RIPE
tech-c: ACRO8769-RIPE
status: ASSIGNED PA
mnt-by: ITDELUXE-MNT
created: 2016-06-22T07:08:29Z
last-modified: 2017-09-25T14:49:34Z
source: RIPE
mnt-routes: HVFOPServer-MNT
mnt-domains: HVFOPServer-MNT

 

  • Interesting pattern used:

[[ $cleanIp =~ ([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3} ]]

This will return true if the variable $cleanIp is an IP of the form 000.000.000.000, with 000 = number from 1 to 3 digits.

This is a case where the double bracket test has to be used, because it will evaluate the variable as well as the pattern. No double quotes can be used, and single bracket test would be on error.

 

/var/log/cron.log

Description

The OS I chose to analyze this log from is available on Amazon AWS: Ubuntu 16.04.3 LTS (XenialXerus).

This system log should log scripts and command executions by the crond daemon, of any level:

cron.* /var/log/cron.log

 

Excerpt

Mar 28 15:35:01 localhost CRON[17736]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Mar 28 15:39:01 localhost CRON[17984]: (root) CMD ( [ -x /usr/lib/php/sessionclean ] && /usr/lib/php/sessionclean)
Mar 28 15:45:01 localhost CRON[18431]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Mar 28 15:55:01 localhost CRON[19067]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Mar 28 16:05:01 localhost CRON[19686]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Mar 28 16:09:01 localhost CRON[19945]: (root) CMD ( [ -x /usr/lib/php/sessionclean ] && /usr/lib/php/sessionclean)
Mar 28 16:15:01 localhost CRON[20381]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Mar 28 16:17:01 localhost CRON[20510]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Mar 28 16:25:01 localhost CRON[21011]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Mar 28 16:35:01 localhost CRON[21630]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)

 

Script for cron.log

for cronlog in /var/log/cron*; do
  [ "${cronlog##*\.}" = "gz" ] && gunzip -kc ${cronlog} || cat ${cronlog}
done |\
  sed -r -n 's/^(\w+)\s+([0-9]{1,2})\s.*CMD\s\([[:blank:]]*(.*?)\)$/\1 \2 \3/p' |\
  sort |\
  uniq

 

 

This script will extract from the cron.log and all the archived, gziped logs, all the uniq commands executed, per day. This is useful to monitor how many commands ran on what day.

Applies on:

/var/log/cron.log
 /var/log/cron.log.1
 /var/log/cron.log.2.gz
/var/log/cron.log.*.gz

 

Example output:

Mar 27 cd / && run-parts --report /etc/cron.hourly
Mar 27 command -v debian-sa1 > /dev/null && debian-sa1 1 1
Mar 27 command -v debian-sa1 > /dev/null && debian-sa1 60 2
Mar 27 sudo /usr/local/sbin/update-ngxblocker -n
Mar 27 [ -x /usr/lib/php/sessionclean ] && /usr/lib/php/sessionclean
Mar 28 cd / && run-parts --report /etc/cron.hourly
Mar 28 command -v debian-sa1 > /dev/null && debian-sa1 1 1
Mar 28 test -x /usr/bin/certbot -a \! -d /run/systemd/system &&perl -e 'sleep int(rand(3600))' &&certbot -q renew
Mar 28 test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
Mar 28 [ -x /usr/lib/php/sessionclean ] && /usr/lib/php/sessionclean

 

Explanations on the script

This script uses regular grouped expressions with sed, so we can extract particular some fields: the first and second (month and day), and the command executed (between parenthesis).

  • Interesting pattern used:

sed -r -n 's/^(\w+)\s+([0-9]{1,2})\s.*CMD\s\([[:blank:]]*(.*?)\)$/\1 \2 \3/p'

We are using the grouping regular expressions with sed, to extract specific fields from the output stream:

    • ^(\w+)\s+([0-9]{1,2})\s Will match the first 2 words, the second being a 1-2 digits number; then we extract these fields with \1 and \2
    • .*CMD\s\([[:blank:]]*(.*?)\)$ will match whatever text is between parenthesis, preceded by “CMD” and the closing parenthesis ending the line. This is the 3rd field so we extract it with \3

 

/var/log/apache2/access.log

Description

This log contains the access logs from an Apache/2.4.29 (Ubuntu) httpd server, which are not server error 5xx (only status codes 2xx and 3xx).

Note: It’s not official but usually apache2 folder normally refers to Apache 2.4+. Until Apache 2.2, the folder used to be named httpd. Again, it’s not official and depends on the packaging team (Ubuntu and Debian seem more prompt to implement such changes).

Again, the main output log is customizable, here is the configuration used:

LogFormat "%a %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

Details of what these parameters mean is found here.

 

Excerpt

127.0.0.1 - - [09/Feb/2018:08:59:46 -0500] "GET /status?full&json HTTP/1.1" 200 1709 "-" "-"
192.0.101.226 - - [09/Feb/2018:08:59:47 -0500] "HEAD / HTTP/1.1" 200 390 "-" "jetmon/1.0 (Jetpack Site Uptime Monitor by WordPress.com)"
127.0.0.1 - - [09/Feb/2018:08:59:48 -0500] "GET /server-status?auto HTTP/1.1" 200 1454 "-" "-"
127.0.0.1 - - [09/Feb/2018:08:59:48 -0500] "GET /status?full&json HTTP/1.1" 200 1720 "-" "-"
54.173.26.123 - - [09/Feb/2018:08:54:34 -0500] "GET /wp/wp-includes/wlwmanifest.xml HTTP/1.1" 200 6665
54.173.26.123 - - [09/Feb/2018:08:54:34 -0500] "GET /site/wp-includes/wlwmanifest.xml HTTP/1.1" 200 6665
54.173.26.123 - - [09/Feb/2018:08:54:34 -0500] "GET /cms/wp-includes/wlwmanifest.xml HTTP/1.1" 200 6665
127.0.0.1 - - [09/Feb/2018:08:54:36 -0500] "GET /server-status?auto HTTP/1.1" 200 1457 "-" "-"
127.0.0.1 - - [09/Feb/2018:08:54:36 -0500] "GET /status?full&json HTTP/1.1" 200 1788 "-" "-"
127.0.0.1 - - [09/Feb/2018:08:54:38 -0500] "GET /status?full&json HTTP/1.1" 200 1765 "-" "-"

 

Script for monitoring bad clients with apache2/access.log

egrep -hwv "\s2[0-9]{2}\s[0-9]+\s|127.0.0.1|localhost" /var/log/apache2/access.log /var/log/apache2/access.log.? |\
  sed -r -n 's/^(([0-9]{1,3}\.){3}[0-9]{1,3})\s.*\s([0-9]{3})\s[0-9]+\s.*/\1 \3/p' |\
  sort |\
  uniq |\
while read IP STATUS; do
  host -W 1 $IP | awk -v IP=$IP -v STATUS=$STATUS '{if(/NXDOMAIN/) {$NF="not found"}; printf "%d %15s %s\n",STATUS,IP,$NF}'
done

 

This script will extract from the current access.log + last day one, all the uniq IP addresses which din’t generated 2xx HTTP status code. It will pass the results (IP + STATUS) to a loop that will resolve the IP addresses into hosts. This is useful to monitor at one point in time how clients behave with your web server, and to discover robots and scanners addresses.

Applies on:

/var/log/apache2/access.log
 /var/log/apache2/access.log.?

 

Example output:

404 192.0.100.11 not found
404 192.0.100.154 not found
404 192.0.100.155 not found
404 192.0.100.35 not found
404 192.0.101.168 wordpress.com.
404 192.0.101.190 wordpress.com.
404 192.0.101.216 wordpress.com.
404 192.0.101.238 wordpress.com.
404 192.0.102.120 not found
404 192.0.102.144 not found
404 192.0.99.155 wordpress.com.
404 192.0.99.179 wordpress.com.
404 192.0.99.227 wordpress.com.
404 192.0.99.250 wordpress.com.
404 192.0.99.59 wordpress.com.
404 192.0.99.83 wordpress.com.
404 35.200.40.114 114.40.200.35.bc.googleusercontent.com.
403 52.174.52.33 census01.project-magellan.com.
404 66.118.142.165 reached
404 69.138.251.87 c-69-138-251-87.hsd1.md.comcast.net.
400 77.72.82.169 hostby.ups-gb.co.uk.
400 93.115.95.205 lh28409.voxility.net.

 

Explanations on the script

  • We start by excluding localhost and 127.0.0.1 addresses from the log (local access)
  • The host resolution is done through a while,read loop
  • Interesting pattern used:

\s2[0-9]{2}\s[0-9]+\s

  • We exclude HTTP 2xx status codes by matching 3 digit numbers starting with 2, followed by space + another space separated number, this will match these 2 words:

192.0.99.59 - - [09/Feb/2018:08:59:46 -0500] "GET /status?full&json HTTP/1.1" 200 1709 "-" "-"

This is because we don’t want to exclude IP addresses containing 2xx numbers.

  • Interesting pattern used:

sed -r -n 's/^(([0-9]{1,3}\.){3}[0-9]{1,3})\s.*\s([0-9]{3})\s[0-9]+\s.*/\1 \3/p'

    • We want to extract IP addresses (first field) and the HTTP status code (potentially 10th field).
    • (([0-9]{1,3}\.){3}[0-9]{1,3}) will match an IP address (1 group containing 3 groups of 1-3 digit numbers followed by a period + one last 1-3 digit number. The subgroup inside the first group of parenthesis will count as the second group extracted (\2, which we won’t use) so we have to keep that in mind for subsequent groups extracted.
    • ([0-9]{3}) matches a 3 digit number, which is the 3rd group extracted with \3

 

/var/log/nginx/error.log

Description

This the error log from nginx version: nginx/1.10.3 (Ubuntu). This log is also customizable, here is the configuration:

error_log /var/log/nginx/error.log info;

This will log any message of the information level and above.

 

Excerpt

2018/03/29 06:25:27 [info] 11226#11226: *19717 recv() failed (104: Connection reset by peer) while reading client request line, client: 172.104.108.109, server: _, request: ""
2018/03/29 09:07:34 [info] 11226#11226: *19899 writev() failed (104: Connection reset by peer), client: 58.19.56.68, server: yourdomain.com, request: "GET /op69okl?name=http://www.epochtimes.com/ HTTP/1.1", host: "yourdomain.com"
2018/03/29 09:07:35 [info] 11226#11226: *19900 writev() failed (104: Connection reset by peer), client: 124.90.51.238, server: yourdomain.com, request: "GET /op69okl?name=http://www.wujieliulan.com/ HTTP/1.1", host: "yourdomain.com"
2018/03/29 09:07:35 [info] 11226#11226: *19901 writev() failed (104: Connection reset by peer), client: 60.1.128.187, server: yourdomain.com, request: "GET /op69okl?name=http://www.ntdtv.com/ HTTP/1.1", host: "yourdomain.com"
2018/03/29 11:32:00 [info] 11226#11226: *20024 writev() failed (104: Connection reset by peer), client: 150.255.87.102, server: yourdomain.com, request: "GET /op69okl?name=http://www.wujieliulan.com/ HTTP/1.1", host: "yourdomain.com"
2018/03/29 11:32:08 [info] 11226#11226: *20028 writev() failed (104: Connection reset by peer), client: 112.117.113.184, server: yourdomain.com, request: "GET /op69okl?name=http://www.ntdtv.com/ HTTP/1.1", host: "yourdomain.com"
2018/03/28 18:50:19 [info] 11226#11226: *19051 recv() failed (104: Connection reset by peer), client:
These specific errors are linked to different cause: amfext module (PHP module for Flash), or FastCGI read timeout error. In the later case, these are clear attempts to access the nginx server's environment with hacked requests.

 

Script for monitoring bad clients with nginx/error.log

grep -hai failed nginx/error.log nginx/error.log.? |\
 sed -r 's#.*\sclient:\s([0-9\.]+),\s.*?\srequest: "(GET /)?(.*?)"?.*#\1 /\3#p' |\
 sort |\
 uniq

 

Applies on:

/var/log/nginx/error.log
 /var/log/nginx/error.log.?

 

Example output:

112.117.113.184 /op69okl?name=http://www.ntdtv.com/ HTTP/1.1", host: "yourdomain.com"
112.66.105.173 /op69okl?name=http://www.dongtaiwang.com/ HTTP/1.1", host: "yourdomain.com"
124.90.51.238 /op69okl?name=http://www.wujieliulan.com/ HTTP/1.1", host: "yourdomain.com"
1.30.162.143 /op69okl?name=http://www.ntdtv.com/ HTTP/1.1", host: "yourdomain.com"
150.255.87.102 /op69okl?name=http://www.wujieliulan.com/ HTTP/1.1", host: "yourdomain.com"
164.132.91.1 /eW88u[%2v39:5"
164.132.91.1 /ZD7FykEy39:5"
172.104.108.109 /"
180.76.15.149 / HTTP/1.1", host: "www.yourdomain.com"
180.76.15.30 / HTTP/1.1", host: "yourdomain.com"
58.19.56.68 /op69okl?name=http://www.epochtimes.com/ HTTP/1.1", host: "yourdomain.com"
60.1.128.187 /op69okl?name=http://www.ntdtv.com/ HTTP/1.1", host: "yourdomain.com"

 

These are lines that show clients on error, mostly hacking tentative. Aside from the robots and the “/”, none of these URL do exist. An IP resolution with whois, as done with the other scripts in this page, will show they originate from China. Indeed.

 

Explanations on the script

This script will grab all the failed access messages from the logs, and extract 3 fields:

  • .*\sclient:\s([0-9\.]+) first group is the IP address; it looks the the keyword “client:” between blanks, just before. output is \1
  • ,\s.*?\srequest: "(GET /)? second group is the request method, it looks for the specific sequence “, <anything> request: \"GET /“, and outputs nothing (\2 is not used)
  • a “/” is appended to the 3rd group because of the second group that trims it: “/\3“.
  • (.*?)"?.* this 3rd group represent what’s in between the double quotes (the request itself). Because there may be more than one double quote after that, the last field of the log (host accessed) is also matched. It’s an error we can live with.