Log Analysis: Access Log Analysis Using Command Line
HAppy New Year 2017!!! My first entry on January. Hopefully will assist in Web Attack investigations.
First, we need know a Log Format :
"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O"
%h Remote host, the IP address of the request.
%l Remote logname, this will never have a value as IdentityCheck is off, it’s just included for backwards compatability.
%u Remote user if htauth is being used (may be bogus if return status (%s) is 401)
%t Time the request was received in the format [day/month/year:hour:minute:second zone]
%r First line of the request
%>s The final HTTP status code, see full list of possible status codes in the HTTP 1.1 specification (RFC2616 section 10).
%b Size of response in bytes, excluding HTTP headers. In CLF format, i.e. a ‘-’ rather than a 0 when no bytes are sent.
%{Referer} The “Referer” (sic) HTTP request header, this is provided by the client request so it may be bogus.
%{User-Agent} The User-Agent HTTP request header, this is provided by the client request so it may be bogus.
%I Bytes received, including request and headers.
%O Bytes sent, including headers.
Then some introduction about tools for analysis :
cat – prints the content of a file in the terminal window
grep – searches and filters based on patterns
awk – can sort each row into fields and display only what is needed
sed – performs find and replace functions
sort – arranges output in an order
uniq – compares adjacent lines and can report, filter or provide a count of duplicates
wc - displays the number of lines, words, and bytes contained in each input file, or standard input (if no file is specified to the standard output.
head - display first lines of a file
tail - display the last part of a file
awk '{print $1}' access.log # ip address (%h)
awk '{print $2}' access.log # RFC 1413 identity (%l)
awk '{print $3}' access.log # userid (%u)
awk '{print $4,5}' access.log # date/time (%t)
awk '{print $9}' access.log # status code (%>s)
awk '{print $10}' access.log # size (%b)
awk -F\" '{print $2}' access.log # request line (%r)
awk -F\" '{print $4}' access.log # referer
awk -F\" '{print $6}' access.log # user agent
wc -l access.log # display number of lines
head -1 access.log # first lines of the files
tail -1 access.log # last lines of the files
Identify IP addresses are making the most request:
analyst#~awk '{print $1}' www-access.log | sort | uniq -c | sort -n | tail -10
3 190.166.87.164
4 114.111.36.26
4 123.4.59.174
4 92.62.43.77
6 208.80.69.69
12 221.192.199.35
14 208.80.69.74
18 10.0.1.14
36 65.88.2.5
241 10.0.1.2
Files are most requested :
analsyt#~ head www-access.log | awk '{print $7}'
/feed/
/feed/
/feed/
/feed/
http://proxyjudge1.proxyfire.net/fastenv
/feed/
/feed/
http://www.wantsfly.com/prx2.php?hash=FABB83E72D135F1018046CC4005088B36F8D0BEDCEA7
/feed/
/feed/
Most popular individual occurrences ad how often each line occurs
analyst#~awk '{print $7}' www-access.log | sort | uniq -c | head
9 /
272 /feed/
3 /login/
2 /robots.txt
20 /signup/
1 /wp-admin
1 /wp-admin/
18 /wp-cron.php?doing_wp_cron
3 72.51.18.254:6677
4 92.62.43.77:6667
Sort by most popular and axe all but the top few matches:
analyst#~awk '{print $7}' www-access.log | sort | uniq -c | sort -rn | head
272 /feed/
20 /signup/
18 /wp-cron.php?doing_wp_cron
15 http://proxyjudge1.proxyfire.net/fastenv
12 http://www.wantsfly.com/prx2.php?hash=FABB83E72D135F1018046CC4005088B36F8D0BEDCEA7
9 /
4 92.62.43.77:6667
3 http://72.51.18.254:6677
3 72.51.18.254:6677
3 /login/
Identify different server responses and requests:
analyst#~awk '{print $9}' www-access.log | sort | uniq -c | sort
2 502
3 400
4 500
8 301
13 302
29 404
306 200
List all user agents ordered by the number of times they appear (descending order):
analyst#~awk -F\" '{print $6}' www-access.log | sed 's/(\([^;]\+; [^;]\+\)[^)]*)/(\1)/' | sort | uniq -c | sort -fr
272 Apple-PubSub/65.12.1
20 Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
18 WordPress/2.9.2; http://www.domain.org
15 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
13 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
8 -
6 Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; en-us) AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4 Safari/531.21.10
4 Mozilla/4.0 (compatible; NaverBot/1.0; http://help.naver.com/customer_webtxt_02.jsp)
3 pxyscand/2.1
3 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/532.5 (KHTML, like Gecko) Chrome/4.1.249.1059 Safari/532.5
1 Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.9.0.19) Gecko/2010031422 Firefox/3.0.19
1 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/532.5 (KHTML, like Gecko) Chrome/4.1.249.1045 Safari/532.5
1 Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; en-us) AppleWebKit/531.22.7 (KHTML, like Gecko) Version/4.0.5 Safari/531.22.7Requests that returned 200 ("OK")
analyst#~awk '($9 ~ /200/)' www-access.log | awk '{print $9,$7}' | sort | uniq
200 /
200 /feed/
200 /login/
200 /robots.txt
200 /signup/
200 /wp-cron.php?doing_wp_cron
Identify Blank User Agents (indication that the request is from an automated script or someone who really values their privacy)
analyst#~awk -F\" '($6 ~ /^-?$/)' www-access.log | awk '{print $1}' | sort | uniq
193.109.122.15
193.109.122.18
193.109.122.33
221.194.47.162
92.62.43.77
Displaying the domain associated with each address:
analyst#~awk '{print $1}' www-access.log | sort | uniq -c | sort -n | tail -10 | awk '{print $2,$2,$1}' | logresolve | awk '{printf "%6d %s (%s)\n",$3,$1,$2}'
3 164.87.166.190.f.sta.codetel.net.do (190.166.87.164)
4 114.111.36.26 (114.111.36.26)
4 hn.kd.ny.adsl (123.4.59.174)
4 proxyscanner.quakenet.org (92.62.43.77)
6 trueventures.pier38.web-pass.com (208.80.69.69)
12 221.192.199.35 (221.192.199.35)
14 69.80.208.web-pass.com (208.80.69.74)
18 10.0.1.14 (10.0.1.14)
36 65.88.2.5 (65.88.2.5)
241 10.0.1.2 (10.0.1.2)
Credit to ~ http://www.the-art-of-web.com
First, we need know a Log Format :
"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" %I %O"
%h Remote host, the IP address of the request.
%l Remote logname, this will never have a value as IdentityCheck is off, it’s just included for backwards compatability.
%u Remote user if htauth is being used (may be bogus if return status (%s) is 401)
%t Time the request was received in the format [day/month/year:hour:minute:second zone]
%r First line of the request
%>s The final HTTP status code, see full list of possible status codes in the HTTP 1.1 specification (RFC2616 section 10).
%b Size of response in bytes, excluding HTTP headers. In CLF format, i.e. a ‘-’ rather than a 0 when no bytes are sent.
%{Referer} The “Referer” (sic) HTTP request header, this is provided by the client request so it may be bogus.
%{User-Agent} The User-Agent HTTP request header, this is provided by the client request so it may be bogus.
%I Bytes received, including request and headers.
%O Bytes sent, including headers.
Then some introduction about tools for analysis :
cat – prints the content of a file in the terminal window
grep – searches and filters based on patterns
awk – can sort each row into fields and display only what is needed
sed – performs find and replace functions
sort – arranges output in an order
uniq – compares adjacent lines and can report, filter or provide a count of duplicates
wc - displays the number of lines, words, and bytes contained in each input file, or standard input (if no file is specified to the standard output.
head - display first lines of a file
tail - display the last part of a file
awk '{print $1}' access.log # ip address (%h)
awk '{print $2}' access.log # RFC 1413 identity (%l)
awk '{print $3}' access.log # userid (%u)
awk '{print $4,5}' access.log # date/time (%t)
awk '{print $9}' access.log # status code (%>s)
awk '{print $10}' access.log # size (%b)
awk -F\" '{print $2}' access.log # request line (%r)
awk -F\" '{print $4}' access.log # referer
awk -F\" '{print $6}' access.log # user agent
wc -l access.log # display number of lines
head -1 access.log # first lines of the files
tail -1 access.log # last lines of the files
Identify IP addresses are making the most request:
analyst#~awk '{print $1}' www-access.log | sort | uniq -c | sort -n | tail -10
3 190.166.87.164
4 114.111.36.26
4 123.4.59.174
4 92.62.43.77
6 208.80.69.69
12 221.192.199.35
14 208.80.69.74
18 10.0.1.14
36 65.88.2.5
241 10.0.1.2
Files are most requested :
analsyt#~ head www-access.log | awk '{print $7}'
/feed/
/feed/
/feed/
/feed/
http://proxyjudge1.proxyfire.net/fastenv
/feed/
/feed/
http://www.wantsfly.com/prx2.php?hash=FABB83E72D135F1018046CC4005088B36F8D0BEDCEA7
/feed/
/feed/
Most popular individual occurrences ad how often each line occurs
analyst#~awk '{print $7}' www-access.log | sort | uniq -c | head
9 /
272 /feed/
3 /login/
2 /robots.txt
20 /signup/
1 /wp-admin
1 /wp-admin/
18 /wp-cron.php?doing_wp_cron
3 72.51.18.254:6677
4 92.62.43.77:6667
Sort by most popular and axe all but the top few matches:
analyst#~awk '{print $7}' www-access.log | sort | uniq -c | sort -rn | head
272 /feed/
20 /signup/
18 /wp-cron.php?doing_wp_cron
15 http://proxyjudge1.proxyfire.net/fastenv
12 http://www.wantsfly.com/prx2.php?hash=FABB83E72D135F1018046CC4005088B36F8D0BEDCEA7
9 /
4 92.62.43.77:6667
3 http://72.51.18.254:6677
3 72.51.18.254:6677
3 /login/
Identify different server responses and requests:
analyst#~awk '{print $9}' www-access.log | sort | uniq -c | sort
2 502
3 400
4 500
8 301
13 302
29 404
306 200
List all user agents ordered by the number of times they appear (descending order):
analyst#~awk -F\" '{print $6}' www-access.log | sed 's/(\([^;]\+; [^;]\+\)[^)]*)/(\1)/' | sort | uniq -c | sort -fr
272 Apple-PubSub/65.12.1
20 Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
18 WordPress/2.9.2; http://www.domain.org
15 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
13 Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)
8 -
6 Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; en-us) AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4 Safari/531.21.10
4 Mozilla/4.0 (compatible; NaverBot/1.0; http://help.naver.com/customer_webtxt_02.jsp)
3 pxyscand/2.1
3 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/532.5 (KHTML, like Gecko) Chrome/4.1.249.1059 Safari/532.5
1 Mozilla/5.0 (Windows; U; Windows NT 5.1; es-ES; rv:1.9.0.19) Gecko/2010031422 Firefox/3.0.19
1 Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/532.5 (KHTML, like Gecko) Chrome/4.1.249.1045 Safari/532.5
1 Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; en-us) AppleWebKit/531.22.7 (KHTML, like Gecko) Version/4.0.5 Safari/531.22.7Requests that returned 200 ("OK")
analyst#~awk '($9 ~ /200/)' www-access.log | awk '{print $9,$7}' | sort | uniq
200 /
200 /feed/
200 /login/
200 /robots.txt
200 /signup/
200 /wp-cron.php?doing_wp_cron
Identify Blank User Agents (indication that the request is from an automated script or someone who really values their privacy)
analyst#~awk -F\" '($6 ~ /^-?$/)' www-access.log | awk '{print $1}' | sort | uniq
193.109.122.15
193.109.122.18
193.109.122.33
221.194.47.162
92.62.43.77
Displaying the domain associated with each address:
analyst#~awk '{print $1}' www-access.log | sort | uniq -c | sort -n | tail -10 | awk '{print $2,$2,$1}' | logresolve | awk '{printf "%6d %s (%s)\n",$3,$1,$2}'
3 164.87.166.190.f.sta.codetel.net.do (190.166.87.164)
4 114.111.36.26 (114.111.36.26)
4 hn.kd.ny.adsl (123.4.59.174)
4 proxyscanner.quakenet.org (92.62.43.77)
6 trueventures.pier38.web-pass.com (208.80.69.69)
12 221.192.199.35 (221.192.199.35)
14 69.80.208.web-pass.com (208.80.69.74)
18 10.0.1.14 (10.0.1.14)
36 65.88.2.5 (65.88.2.5)
241 10.0.1.2 (10.0.1.2)
Credit to ~ http://www.the-art-of-web.com
Comments
Post a Comment