Who's Knocking?

A Brief Analysis of Unwanted Internet Traffic

John Peterson, July 2003
contact102 AT saccade DOT com

Our house is hooked up to the Internet with run-of-the-mill DSL 300Kbps DSL connection. It has a router (an aging SMC 7004BR).that the rest of the home network sits behind. The router kindly keeps a log of the the last couple hundred "unrecognized access" events - attempts to probe into the house network. When I first hooked up the router a couple years ago, this was maybe a dozen or so a day. But recently when I checked, it was now hundreds of times a day. Somebody (or something) was probing our line about every four and a half minutes. What's going on?

Background

The protocol used to exchange information on the Internet - TCP/IP - sends information as "packets". The packets of data can be thought of as letters in envelopes; the data is wrapped up inside (the "letter") and the information about where its going and where it's from is contained in a header (the "envelope"). The envelope has two pieces of addressing information on it - the address - which computer on the internet it's going to, and the port - what the purpose of the data is. The address is a number that your computer looks up from a master directory to convert names like www.yahoo.com into numbers like 66.218.71.92, much like you'd look up a number in the phone book. The port number identifies various services the packets are used for. For example, data sent to port #25 is intended for email, #80 is for fetching a web page, etc. [If you want to learn more about TCP/IP, there's lots to read on the web]. Most home networks typically don't have any services (e.g., web servers or mail hosts) so in theory there should be no inbound traffic. In theory...

Although the SMC router can be configured to allow incoming traffic to pass through into your network (desirable if, for example, you have your own web server running there) by default it blocks all incoming traffic and notes it in the log file. The log keeps track of three things: when the packet arrived, what address it came from, and what port it was intended for. I wrote a script for our Linux server to harvest this information and keep an ongoing log for a couple of weeks (the SMC router itself can only store a couple hundred entries).

What I found

Over the two week period, the IP address for our DSL connection was probed an average of 323 times a day, or about once ever four and a half minutes. 13.5% of these were multiple attacks, where the same host probed the same port three (or more) times within a 15 second period. The probes occurred uniformly throughout the day, with a slight bias towards the night/early morning hours (midnight to 11am PST). The probes per day ranged from 256 to 450. The most frequent attacks from an individual host were 117 (done in a single two minute blitz); 16 hosts probed over a dozen times (usually with triple attacks over a period of several days).

Geographically, the probes came from:

       UNKNOWN: 1658 (40.11%)
           net: 1048 (25.35%)
           com:  322 ( 7.79%)
            mx:  180 ( 4.35%)   Mexico
            br:   94 ( 2.27%)   Brazil
            jp:   91 ( 2.20%)   Japan
            it:   66 ( 1.60%)   Italy
            fr:   62 ( 1.50%)   France
            pl:   51 ( 1.23%)   Poland
            ca:   50 ( 1.21%)   Canada

The "UNKNOWN" refers to TCP/IP addresses that could not be resolved to a hostname using the "Domain Name System" - the Internet's phone book for converting computer names to numeric addresses. This isn't surprising, since you would expect people doing suspicious activity to take steps to hide their identity and location. The next domain .NET, also makes sense, because many attacks are going to be launched via an individual machine at an Internet Service provider, and most of those use the .NET domain. Although many companies outside the US use .COM and .NET domains, the country domains do give an idea of overseas traffic, with Mexico right near the top of the list, followed by Brazil. This surprised me; I expected Europe to be ahead of Latin America in generating this sort of traffic.

Looking closer at the top level domains reveals plenty of well known names:

                       UNKNOWN: 1658 (40.11%)
                 speakeasy.net:  155 ( 3.75%)
                  t-dialin.net:  136 ( 3.29%)
                prodigy.net.mx:  114 ( 2.76%)
                         ne.jp:   68 ( 1.64%)
                     hinet.net:   68 ( 1.64%)
                  rima-tde.net:   67 ( 1.62%)
                    wanadoo.fr:   57 ( 1.38%)
                   pacbell.net:   53 ( 1.28%)
              interbusiness.it:   52 ( 1.26%)
                avantel.net.mx:   45 ( 1.09%)
                 bellsouth.net:   43 ( 1.04%)
                   verizon.net:   33 ( 0.80%)
                      tpnet.pl:   32 ( 0.77%)
                    swbell.net:   32 ( 0.77%)
                        rr.com:   29 ( 0.70%)
                  ttnet.net.tr:   28 ( 0.68%)
                 telesp.net.br:   25 ( 0.60%)
               btopenworld.com:   25 ( 0.60%)
                 ameritech.net:   24 ( 0.58%)
                  videotron.ca:   23 ( 0.56%)
               dsl-verizon.net:   22 ( 0.53%)
                     attbi.com:   20 ( 0.48%)
                netvigator.com:   19 ( 0.46%)
                  sympatico.ca:   19 ( 0.46%)
                     qwest.net:   19 ( 0.46%)
                       aol.com:   17 ( 0.41%)

Most of the names look like retail ISPs that sell to individuals. Most of the US broadband providers (Verizon, AOL, RoadRunner(rr), the various Bells) show up in the list. Near the top is my ISP, Speakeasy, indicating a fair amount of the traffic is people surfing the local address space I share with them. The retail nature becomes clear if you look at the individual hostnames (things like "user-10cm1it.cable.mindspring.com"), indicating individual broadband connections. So at least within the 60% of the traffic we can identify, it looks like it comes from individuals, not institutions.

The ports that were probed were:

  netbios-ns         =  2520 (60.96%)  MS Windows File sharing
  netbios-ssn        =   345 ( 8.35%)  MS Windows file sharing
  ms-sql-s           =   248 ( 6.00%)  Microsoft database access
  epmap              =   197 ( 4.77%)  Microsoft network services
  www-http           =   190 ( 4.60%)  Web browsing
  real               =   158 ( 3.82%)  Real Networks 
  ftp                =   110 ( 2.66%)  File Transfer
  ms-sql-m           =    59 ( 1.43%)  Microsoft database access
  microsoft-ds       =    41 ( 0.99%)  Microsoft
  https              =    39 ( 0.94%)  Secure web access
  unknown(57)        =    31 ( 0.75%)
  smtp               =    29 ( 0.70%)  Outgoing email
  http-alt           =    28 ( 0.68%)
  ndl-aas            =    20 ( 0.48%)

Note the overwhelming majority of the probes going after Microsoft related services! Over 80% of the list above. There are several reasons for this.

Conclusions

If you're running a Windows machine connected to a broadband Internet connection that's left on for any length of time (like, five minutes!) you must get a router that will filter out unwanted inbound traffic. Without it, you're very vulnerable to a flood of traffic trying to hijack your machine. There's no point in trying to track down the source of all of this traffic; it comes from all around the world every day. We can't be 100% certain that all of the traffic is malicious. It could be, for example, that somebody is probing www-http ports hoping to find new interesting web pages to read. But most reports on this indicate that the traffic is indeed unfriendly.

More Information

My exploration of this was a weekend hack to satisfy my curiosity. If you want to learn more, check out the Internet Storm Center; they track this stuff for a living. Steve Gibson has written some fascinating accounts of network attacks and how they operate. He also has a tool for testing your system's vulnerability to attack. Highly recommended.

Tools

I wrote the scripts to analyze the information in Python. Learning to do this type of thing in Python was actually the main goal of the project, and the project taught me about using Python for pattern matching, accessing web services and extracting information.

Unfortunately the tools I came up with are relatively specific to the router I used, and will need to be modified significantly for another style of router. Still, they could be useful as a starting point:

AttackLog.py Generates a running log file from the SMC firewall

port-numbers.txt   
Description of the port numbers (used by AttackLog.py)
AttackReport.py Analyze the log generated by AttackLog

Update 7-Sep-03
The above was written prior to the "MSBlaster" worm fiasco that hit the Internet in mid August. Since that worm hit, it alone accounts for 75% of the traffic (appearing in my scripts as attacking the "epmap" TCP/IP port). The average frequency of attack is now under three minutes.

Update 15-Mar-05
Read "Know Your Enemy: Tracking Botnets" for an excellent description of malicious traffic and its sources. Also, our ISP switched from DSL (which never worked well at our location) to Comcast cable.

Update Jul-05
The "Aging SMC router" inexplicably bit the dust - the lights were blinking but nobody was home. It was replaced by a D-Link DI-604. The new router supplies similar firewall functions, but not the logging, thus ending the experiment. Also, the cable speed went out to 30MB/s. I have no idea what the current break-in rate is at, but I'm quite confident it hasn't gone down.