Related YouTube Video

Finding hacked web servers with log files.

After making a server available to the internet, especially on a shared host you immediately get hit with automated ssh login requests. To view these attempts on Debian based systems by grepping for the words "Invalid user" or "authentication failure" in the /var/log/ directory

grep "Invalid user" /var/log/auth.log
Sep 30 14:56:18 joesite sshd[31553]: Invalid user ubuntu from ` port 46502
Sep 30 14:56:30 joesite sshd[31557]: Invalid user ansible from`1 port 33304
Sep 30 14:56:42 joesite sshd[31561]: Invalid user oracle from port 48326
Sep 30 14:56:59 joesite sshd[31568]: Invalid user ftpadmin from port 42644
Sep 30 14:57:05 joesite sshd[31570]: Invalid user test from port 50166
Sep 30 14:57:10 joesite sshd[31572]: Invalid user testuser from port 57680
Sep 30 14:57:16 joesite sshd[31574]: Invalid user weblogic from port 36956
Sep 30 14:57:23 joesite sshd[31576]: Invalid user user from port 44472
Sep 30 14:57:28 joesite sshd[31578]: Invalid user ts3 from port 51988

The format of these log files looks like this

[Time][Hostname][daemon][pid]:[Message][username][ip][port number]

I want to dig a little deeper.

Where do these malicious requests come from? It’s safe to assume that most of these are automated with some simple python script. But I’m interested to see how many of these login requests are from other shared hosting environments like web servers. An attack coming from the some random starbucks wifi network or the basement of some 16 year old H4x0r’s basement is less concerning than an attack coming from a hacked webserver, because the latter implies a much higher level of sophistication.

To start I first needed a large dataset IP addresses so wrote a bash script that runs on a cron tab every 3 or so days that backs up all the files that start with auth in my /var/log/ directory.

	time_stamp=$(date +'%m_%d_%Y_%H_%M')
	mkdir $save_dir

	cp /var/log/auth.log* $save_dir

	cd /root/logs/auth

	7z a $time_stamp
	rm -rf $time_stamp

This bash script will make a directory with the current time, copy all files in /var/log/ that start with auth to a directory with the current time, then it adds them a p7zip archive.

Unpacking the archives to bring these files to my local machine I use rsync

rsync -avz nj:/root/logs .

I then do every thing I did remotely in reverse with the addition of extracting gzip archives (which Nginx makes my default) and renaming all the files so that they end with .log unzip all archives in directory

for i in *.7z; do 7z x -o"$i""_dir" $i ;done

guzip all gzip archives

gzip -d *.gz

rename files ending with numeric values

for i in *.{1..4}; do mv $i $i.log; done

Combining the log files

Each one of these log files contained between 9k-32k lines

$ for i in *.log; do wc -l $i; done                
    2881 auth.log
   17521 auth.log.2.log
   39566 auth.log.3.log
    9693 auth.log.4.log
   12906 auth_2.log``

To make things easier to work with I appended all the log files to one log file named all_logs which was about 82k lines long.

$ for i in *.log; do cat $i >> all_logs.log; done
$ wc -l all_logs.log
	82567 all_logs.log

Cleaning up the data

Some of these requests were legit attempts from me to login so I removed all lines from this file that didn’t contain the words Invalid user from the file using grep

grep "Invalid user" all_logs.log >> all_invalid_users.log

After this the file moves from 82k to around 25k

 wc -l all_invalid_users.log                      
 24928 all_invalid_users.log

dealing with duplicates

When I looked at the new log file I noticed that most of these attacks attempted several combinations of username and password, resulting in a new log entry from the same IP. Before I could remove the duplicates I needed to extract only IPs from the log file. I did this with a grep pattern that extracts only the ip address from each line.

grep -Eo '[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}' all_invalid_users.log >> all_ips.txt 

After this I was left with a file that has the same amount of lines as all_invalid_users.log but with only ip addresses.

To remove the duplicates from this file I used a combination of the sort and uniq commands which took the ip list down from 25k to 3,707.

$ sort all_ips.txt | uniq -u >> uniq_ips.txt
$ wc -l uniq_ips.txt
    3707 uniq_ips.txt

Removing Tor exit nodes

Now that I had a list of unique IP addresses the first thing I did was cross out the usually suspects. Tor Exit nodes. Tor is network of computers run by volunteers that allows people to access the web “somewhat” anonymously by routing traffic through a series of nodes ending in an exit node. Exit nodes are the computers in the network that that do the job of routing whatever traffic is sent through, to the clear-net. Exit nodes do not know who is sending requests, they just act as a middle man between tor users and the open internet. Unfortunately this means they often get blamed for malicious requests sent from their server.

To find out how many of these requests are from exit nodes I compared an up to date list of exit nodes (provided by the tor project) to our current list of IPs using a combination of the sort and uniq commands.

# grab the list of exit nodes
cat torbulkexitlist uniq_ips.txt| sort | uniq -d >> tor_nodes_attempting.txt
wc -l tor_nodes_attempting.txt 
       0 tor_nodes_attempting.txt

It seems the the tor network has not deemed me a credible target. Let’s hope it stays like that.

Finding domains and hosting providers

The next thing I tried, was to associate these IP associate with domain names. It’s pretty easy to find the IP associated with a given a domain name using the dig command.

; <<>> DiG 9.10.6 <<>>

However, finding the domain name of a given ip is a different challenge because:

  1. Most ip addresses are not associated with domains
  2. Domain Names are often associated with different IPs, especially when you start considering load balancing on larger web applications.


There is this tool nslookup that can be used to give you some more information on a given ip address. Most of the time it won’t return the top level domain but it will often return some information about the hosting provider. Using only the IP of my web server I’m able to find that it is hosted using, similar things will appear with digital ocean, aws, lenode etc..

Non-authoritative answer:	name =

I could use this to report each machine to its hosting provider but this is usually not worth your time unless you’re losing a bunch of money or something.

Finding web servers

Now that I had a list of over 3k malicious IPs I needed a way to sort out those running web servers. I could have ran a port scan on each IP but not only would that take forever, it would probably put me on some other sysadmins malicious IP list. The goal of the port scan is to find servers with an open port of 80 or 443 which are the ports used by http and https meaning that they are running a web server. But an even faster way to check if an IP is running a web server is to just send it an http get request. If it returns a code 200, it’s probably a web server. I wrote this python script to do exactly that. It iterates through my text file of IPs and adds the result of the get request to a database. the code for everything is linked at the bottom of this post

def check_webserver(file_path):
    with open(file_path, 'r') as f:
        lines = f.readlines()
        for count, ip in enumerate(lines):
            ip = ip.strip()
                resp = requests.get(f"http://{ip}", timeout=5)
                html = resp.text
                with open("scanner.log", "a") as f:
            except requests.exceptions.RequestException as e: 
                add_ip_to_db(ip,"false", "")

                with open("scanner.log", "a") as f:


The above script also saves the raw html of whatever is returned by the get request, this is what the database looks like: 3444e402bd569f710344f19cf20e3f98.png

filtering IPs from database

To filter out the IPs with web servers we can run the following sqlite query on the database which takes our list down 507 uniqe IP addresses

sqlite3 ipinfo.db "select ip from ips where is_web_server='true'" >> ips_with_webservers.txt
wc -l ips_with_webservers.txt
507 ips_with_webservers.txt

Automating viewing the web servers

Rather than manually look at all 507 of these web servers through a browser wrote a script that uses the selenium web driver to screen shot the default home page of each IP address.

def run_driver(ip_list):
    browser = webdriver.Firefox()
    ips_found = os.listdir("screenshots")
    ips_found = [foo[:-4] for foo in ips_found]
    for ip in ip_list:
        if ip not in ips_found:
                print(f"Getting {ip}")

                url_text = browser.current_url
                with open("urls.txt", "a") as f:

                overlay_url(url_text, ip)
                print(f"saving screenshot to screenshots/{ip}.png")

            except TimeoutException:

What I found

Default Webserver Pages

By far the most common thing I found was default nginx and apache landing pages a16755a21e9fe52aa865f9cca305a0f7.png ac7574ed80e4223f37f4099240005f48.png

Another common thing was phpinfo pages


Pet Projects & Abandoned CMS

I was somewhat surprised to see a lot “corporate” looking websites from different countries. But upon closer investigation I found most of these were default CMS themes with Lorum Ipsum text and stock photos meaning they were probably just staging sites. However there were a couple of business sites I confirmed were real. This brought be back to my days slinging overpriced WordPress sites to boomers. I would often set up a staging server with the free theme and send the ip to my client to show what it will look like. Odds are I did not secure those sites properly.

840cff6a1ae4a1f3a8af248b8ef1924e.png 0c850fb2b62a067ccc2399e001a4c392.png

I also found a lot of “demo day” CRUD applications and database guis.

bf2c17c18e6f3ed7c2f3e6c9245a450f.png 1780b287ae9cd113641573604789ba3f.png

The More Concerning Stuff

Windows machines ea55ef91b8468cc6143d25ef0394f295.png

Nextcloud instances 6c3a97de4b232c8b60747074b60f6faa.png

Router Admin Pages 86fefaf6924cb12a8e944c9e879ec2ec.png

The weird shit

I don’t why would you ever need a dashboard like this, much less one that connects to the internet. God save us. 43b9331bb647749995b076c6886dbf2c.png

Git Repo With Code