TOPIC
In managing a large Ethernet network, you need to assess your network's performance and understand how it affects servers and clients. Several programs based on Mac OS X Server's system administration tools can help you collect diagnostic data. You can also use third-party tools to design and manage networks effectively.
DISCUSSION
Dealing with unplanned growth A network can grow in many ways. For instance, if a department in your organization expands into several departments, you might use routers, concentrators, and more servers to add computers. If your network has grown without much planning, how well is it running? What are the major sources of network traffic, and are they impeding performance? How does the network's structure affect both users and hardware? Fortunately, several network administration tools are at your disposal to help you answer these questions and plan future network expansion. Mac OS X Server provides a set of software tools to help you diagnose networking pitfalls. Many third-party products on the market can provide more detail. Map out what's where A great way to start understanding your network is to map it out accurately. When you have a graphical view of the physical topology of your network and a history of the changes you've made, you'll find it easier to track and plan network expansion. Some network management systems can automatically diagram the network for you. However, you can make a map of your network with anything from pencil and paper to a complicated illustration application--the most effective tool is the one you find easiest to use. If your network is very complex, use several diagrams. In your diagram, include all the meaningful, accurate reference information about the network that you can think of. The more complete the diagram, the easier it is to trace and correct problems. For example, label routers and important cable segments and indicate their types, such as fiber or twisted pair. Use line style or color to indicate cable type. If a cable splits, show what device splits it--include its name, so you can query it remotely. You might even highlight hardware that's not connected to an uninterruptable power source (UPS) to note problem areas in the event of power failure. Most important, include a key that says what the symbols and colors mean, so that later you or someone else can easily understand the diagram. If you're not sure whether to include a piece of information, throw it in just in case. You can decide over time which information is unnecessary. Better to have too much information than too little, especially when you have to fix the network fast! Detect congestion Packet collisions can be a major cause of network inefficiency. A collision occurs when two or more computers on the network try to transmit data at the same time. The packets collide and become deformed, losing headers, destination addresses, and the data they were meant to carry. Because collisions are a normal part of Ethernet operations, controllers are designed to handle them properly. When a network interface detects a collision, it waits for activity on the network to stop and then tries to retransmit its data. It uses a "binary exponential back-off'' policy, which works something like this: If a network interface detects a collision, it waits a random amount of time and tries to retransmit. If collisions continue, it repeats waiting a random amount of time and retrying the transmission. The algorithm it uses to choose the wait time ensures that on average, the delay time doubles with each retry attempt. This is a very effective method of handling collisions. A correctly engineered network shouldn't have enough collisions to degrade network performance. However, when a network is overloaded with traffic or when it contains improperly configured systems, lots of collisions can result, causing a network "traffic jam.'' Measuring collisions with netstat One way to monitor collisions is to use the netstat command. netstat has several parameters to isolate different types of network traffic. See the netstat UNIX manual page for a list of all the parameters you can use. Below, Example 1 shows a sample of netstat output for all interfaces on the computer grunge. The -i option shows statistics on all interfaces that were automatically configured at boot time. Example 1: Sample output from netstat -i grunge> netstat -i Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll en0 1500 mynet grunge 6976827 0 8133029 0 0 lo0 1536 loopback localhost 318631 0 318631 0 0 en0 1500 none none 7023624 217 8133029 0 74821 The statistics displayed by netstat -i are these: A simple test is to check the Coll value of the last Ethernet interface. It should be less than about ten percent of the total number of packets--in the example, the collision rate is well within tolerance. While this is a good quick-and-dirty test for problems, it's just a statistic--you'll need to collect more data to pinpoint exact problems. For instance, if Opkts is very small, you can't rely on the collision count to give an accurate picture of performance. Use other tools to collect data to support the values you find with netstat. To gather more information, run netstat -i at various times of day to see if your network has higher collision rates at specific times. Focus on the peak times and match them to specific occurrences, like everyone logging in first thing in the morning or many people running some application right after lunch. Also collect more information from other computers, both clients and servers, for comparison and greater depth of data. Study the average of the numbers--if the average number of collisions is high, some reorganization via routers and subnets may help, or perhaps further investigation is needed. Keep in mind too that using long cables can increase the likelihood of collisions. Use shorter Ethernet segments so that the network can handle increased traffic before collisions become a problem. Counting connections Another way to check network load and performance is to run netstat without options and count the number of protocol connections. Table 1 shows an example. Table 1: Sample output from netstat grunge> netstat Active Internet connections
The Send-Q column shows the number of entries waiting to be sent out. By checking the numbers in the Send-Q column, you can see whether the network is too congested. This number should be 0 for most of the connections when things are running smoothly. If it's higher than 0 for several connections, you might have too much traffic on the network. Monitoring remotely There's another special tool--snmpnetstat--for monitoring and diagnosing congestion and performance remotely, to make network administration easier. snmpnetstat is designed to run with Simple Network Management Protocol (SNMP) which is included with Mac OS X Server since Release 3.0. SNMP is a combination of host daemons and programs that can remotely and continuously monitor many network and system functions. To find out how to install SNMP, see the Mac OS X Server Network and System Administration book. For a Mac OS X Server interface to many SNMP queries, check out NetWatch(TM), a set of applications based on SNMP from RidgeBack Solutions, Inc. To monitor performance on a remote computer without snmpnetstat, you log in with telnet or rsh and then run netstat. With snmpnetstat you can query the host remotely. snmpnetstat continues collecting statistics until you stop it. For instance, you can remotely invoke the SNMP daemon on grunge and then monitor grunge for collisions and bad packets. Table 2 shows statistics on interface en0 on host grunge, collected at two-second intervals. You can use snmpnetstat to constantly monitor a host's performance, particularly to diagnose problems that are intermittent and therefore hard to catch. You can also use SNMP to make protocol-specific inquiries on a specified computer, as in Example 2. Table 2: Sample output from snmpnetstat client> snmpnetstat -I en0 grunge 2 input (en0) output input (Total) output packets errs packets errs colls packets errs packets errs colls 206083 0 137452 2 0 618097 0 498223 2 0 0 0 0 0 36 36 0 0 11 0 9 0 0 57 0 53 0 0 0 0 0 0 0 51 0 52 0 0 2 0 1 0 0 40 0 38 0 0 11 0 8 0 0 72 0 67 0 0 0 0 1 0 0 38 0 39 0 0 10 0 8 0 0 71 0 68 0 0 2 0 1 0 0 41 0 39 0 0 5 0 1 0 0 46 0 38 0 0 11 0 10 0 0 73 0 71 0 0 1 0 1 0 0 38 0 38 0 0 0 0 0 0 0 36 0 36 0 0 ... client1> snmp grunge snmp> ip-status The SNMP entity is acting as a host. The default time-to-live for IP packets is 60 msec. Datagrams received: 407020, forwarded: 0, consumed: 407024 format errors: 0, misdeliveries: 0, resource limitations: 0 destined for unknown protocols: 0 Datagrams req'd for transmission: 0, discarded due to no route: 0 Timeout value for reassembly queue: 60 Fragments created: 0, Fragments received needing reassembly: 12344 Datagrams successfully reassembled: 12344, successfully fragmented: 0 Datagrams needing fragmentation (but the IP flags field said not to): 0 snmp> tcp-status The retransmission algorithm is vanj. Min/max retransmission times are: 1000/64000 (msecs) Max # of simultaneous TCP connections allowed: -1 Number of active opens: 358, passive opens: 266, current open connections: 21 Failed connection attempts: 0, connection resets: 0 Number of segents received: 144189, sent: 143194, retransmitted: 14 Example 2: Making an SMNP protocol query The first query in the example in Figure 5 asks grunge for its IP statistics. These statistics include important information on network fragmentation and general protocol errors. The second query requests TCP statistics, such as the number of failed connection attempts. UDP statistics aren't currently enabled in Mac OS X Server, so the SNMP command udp-status doesn't work. Check out file system performance The Network File System (NFS) protocol suite can account for a major part of network traffic. Computers use NFS to mount file systems from one computer to another. Copying and moving files, for example, are handled by NFS. Because NFS is usually very heavily used, its performance has a big effect on overall network performance. One way to find NFS performance problems is to look at individual clients that import directories, and isolate any that are dropping packets or are being overloaded. In Figure 6, nfsstat shows the NFS statistics for client1. nfsstat -c reports statistics about NFS requests that this system created. client1> nfsstat -c Client rpc: calls badcalls retrans badxid timeout wait newcred 4267683 54 529 2 582 0 0 Figure 6: Sample output from nfsstat -c The statistics displayed are: The most important values are those for retrans, badxid, and calls. The retrans field (here a client RPC value) indicates the number of RPC requests the client had to retransmit while reading or writing files using NFS. If the number of retransmissions is larger than five percent, then the interface has needed to repeat RPC requests from a server a relatively high percentage of the time, and may be in trouble. If retrans is large and badxid and time-out are roughly equal, then one or more of your computers is dropping packets. This means it's too bogged down to handle all incoming requests and is refusing client RPC requests. Because this data comes from the client RPC, it represents the NFS server's load more than the network hardware's load. To measure possible network problems with netstat -c, see if retrans is high and badxid is relatively close to zero. If so, retransmissions might be caused not by server slowness but by network data corruption. You'll need to use a network analyzer (see the next section) to look at the original network packets and check for data corruption. To learn more about NFS, see the Mac OS X Server Network and System Administration book. To find out how to tune NFS, see "NFS Performance Tuning'' in this issue. Go looking for trouble An effective all-purpose networking tool is a network analyzer, usually a portable computer equipped with a special Ethernet adapter and software for network monitoring. You can connect it to the network at critical locations to test everything from the packets traversing two computers to the total statistics of an entire network. Figure 7 shows statistics from a Network General Sniffer. With a network analyzer you can collect data to find peak network load and possible weaknesses in design. In addition to a complete suite of scripts and statistics, network analyzers can generate heavy network traffic on demand for testing, and also isolate and study the form and function of any message that traverses the net! While a network analyzer is a large investment, it's invaluable in solving problems and understanding the true nature of networking. Network History This report provides the error history for a specified length of time. |
Document Information | |
Product Area: | Mac OS System Software |
Category: | Mac OS X Server |
Sub Category: | General Topics |
Keywords: | kmosXserver |
Copyright © 2000 Apple Computer, Inc. All rights reserved.