Forums » Bugs

Should You Report Server Lag?

Oct 07, 2015 incarnate link
As an online game where network latency genuinely matters and has a significant impact on the gameplay experience, it is natural that anyone who has a bad experience should want to report "lag".

Unfortunately for us, "lag" is terrible blanket-term that covers half a dozen different concepts, many of which are unrelated, and the most-common of which have nothing to do with the game at all. We'll go through a few of them here:

My ping is higher (or more variable) than I think it should be! - This usually stems from a user utilizing the "/ping" command in the game, which shows the game protocol's inherent latency statistics. However, while the game may measure the statistic, it does not actually create latency problems. Genuine game-server issues exhibit themselves in many ways, but not latency spikes (and this statement comes from, at the time of this writing, 13+ years of running the game publicly on the open Internet).

Your "ping" time is a measurement of the quality of your connection to the game server. The Internet is a very complex set of different networks, all connected together in various ways, which are constantly "in flux", meaning that at any given time "somewhere, the internet is broken". The way your packets traverse these different networks is via Dynamic Routing (specifically a protocol called BGP). Through the magic of BGP routing (and other factors), major connections on the internet may break, but your packets still reach your desired destination, as they are routed through a new path.

Unfortunately, this new path may be sub-optimal in some way. For example: Say you live in Germany, and are trying to connect to the game servers, which are currently in Chicago. Your packets traverse various networks to get you to the Atlantic, where they might ride TATA TGN-Atlantic trans-atlantic cable to New Jersey, then ride another continental network the rest of the way to Chicago. But, what happens if that transatlantic cable should experience an outage, or maintenance, or be overloaded? The routing protocol might then chose an alternate path, which might, for the sake of an example, take you instead down via Spain and then jump on the Columbus III to Florida. Then your packets will head north from FL up through other networks to Chicago.

The change from the England -> New Jersey path to the Spain -> Florida could easily add another 30ms to your ping times, and similarly degrade your connection. And this all happens transparently and constantly, and has nothing at all with Vendetta Online, or its servers.

I used a trans-atlantic cable as an example, as the nice maps provided by SubmarineCableMap give a clear visual indication of different physical route-paths that data may take, and that how long it takes them to reach their destination (network latency / ping time) is ultimately limited by the length of the cable (and the speed of light). All other things being equal, a longer path must always be slower.

But, really, this is not limited to trans-atlantic connections. This kind of connection-breakage and routing-change is happening constantly, and everywhere. Before I made videogames, I was an engineer involved in building these huge international networks, and our lives were a constant headache of dealing with stray back-hoes, construction workers digging in the wrong place and accidentally cutting fiber, or pieces of network equipment failing.

The only reason why you don't notice the constant-breakage, is because most internet services don't expect content to truly be real-time. Websites and video can be cached all over the world, and you won't be able to tell where the data comes from. But our game is a truly real-time exchange of data between players in one single place (our server), which is basically required by the type of game that we are (single-universe, real-time MMORPG). Thus, any network problem is highlighted by the game in giant neon lights, and is glaringly evident. But this doesn't make the problems our fault, nor anything we can influence or change.

Given that the internet is always broken (somewhat), it should be expected that your ping will vary from time to time, and day to day. Some days the "weather" will be good, other times it will be poor.

The best any application provider (like ourselves) can do, is simply have the best connectivity we can.. which we do. Four major tier-1 connections, at 10gig+ levels, along with lots of peering, all with careful monitoring and excellent routing.

My ping is steady and reasonable, but my game experience is poor.

This is more of an example of something that may actually be related to the game, and should be reported. Be sure to report exactly when and "where" (what sector) the issue occured, but it could stem from a few things:

- Overloaded cluster server doesn't have enough CPU to properly run the given sector.

In this case, the next sector you fly into will probably be fine, as sectors are distributed within our "cluster" by individual system load, so the "next" started sector will probably be on a less-busy physical server. However, we cannot always predict how much CPU a sector may take (as players may "do stuff" anywhere), so an unexpected high-utilization sector may impact other "neighbor" sectors that are running on the same physical hardware. At any given time, we are only using a tiny fraction of our total CPU capacity, but there is still the possibility of the occasional issue within a single machine out of many.

We do monitor all of this internally, but player observations are still helpful, as it's useful to know if things are getting bad in one particular "location" (sector) on a regular basis, etc. Being "in" the sector may make potential causes more apparent, like a huge NPC convoy that's failing to dock, or some other possibility.

- A genuine bug of some sort in the game client or server.

This is extremely rare, but can occasionally occur, if there as been a recent release. Generally, our network protocol doesn't fluctuate much, and after 13 years in use.. it's pretty freaking solid. But, this is an actively developed game, and it's always possible that we can break anything. So please do let us know.

The most important thing to keep in mind is to limit reports to cases where your ping is both steady and reasonable. Every time we're told the "server has problems", because someone has a bad ping (or dozens of people have bad pings) we waste a bunch of time checking things out, only to read on a network-provider mailing list that a piece of network equipment blew up in NYC, and as a result the internet as a whole is crappier than usual for awhile (read the NANOG mailing list archives if you don't believe us.. stuff is going down every day, and impacting large numbers of people who mostly don't notice, except that their YouTube has a blip during playback).

For a tiny team, time is our most precious resource, so please take great care before doing so.

Two other final notes to keep in mind:

- Correlation is not Causation

People often want to tell us how the latest version did this, or the server problem is that, because they have chosen to correlate their problem with the given version, or the server change, or whatever. This is unhelpful. Because these people are usually wrong, and it would have been better to just report the basic facts to us, without the theories.

- Avoid Confirmation Bias

Confirmation bias is the human cognitive problem where (in this particular case).. once we start building theories about what is causing a problem, we start looking for data to support our theories, and subconsciously ignoring data that conflicts with our theory. This is why it's better to just stick to reporting the basic facts, because people have a tendency to "leave things out" that don't align with their chosen theory.

--

A good example of the above two situations, is the degree to which people have been reporting "server lag" problems since we migrated the server from Milwaukee to Chicago. This is surprising, because the connection in Chicago is far better, and rides the same networks. Plus, Milwaukee data generally has to go through Chicago anyway, so latency is reduced by being in Chicago.

But nonetheless, a player in Europe may be convinced that his ping is higher because we moved the server to a new location (correlation must be causation!), because he used to get 120ms, and now he gets 150ms. So, we check the routing, and find that in both cases it's using the same network (Level3), and the same route, but that his local ISP degraded their local routing choices (for unknown reasons) around the same time as we moved the server. So, where this player gets 150ms to the "new" server (his "bad" connection), he would actually be getting 154ms to the "old" server that he thinks he would prefer.

Similarly, if people get together in game and exchange how they're all having a degraded ping experience, that it must be the server, they will continue to believe this.. seemingly forever (in some cases, regardless of anything we say, or evidence to the contrary), due to a kind of group confirmation bias.

Human cognition is what it is, but for the sake of the game, please keep this kind of stuff in mind when reporting issues:

- Make detailed reports, dates, times, in-game locations, activity at the time, etc.
- Stick to the facts, either leave out the theories entirely, or put them at the bottom.
- Don't put too much emphasis on "but everyone I know is seeing the same issue!", that just means you're all using the.. internet. It is helpful to report group commonalities like this, but not conclusions based on them.

Major issues can impact tens of millions of people at a time, and will rarely be reported in the press, because most won't notice changes in latency. If a page does an extra 10th of a second to load, it might vaguely annoy someone, who will forget a second later.. but for us, a 100ms spike is a huge change in game dynamics. Similarly, a YouTube viewer, or Pandora listener won't even notice a 1000ms latency spike, because their data is all streamed and predictably buffered. But on VO (wherever everything is happening "right now"), you will hear the yelling and fist-waving of many unhappy players who just exploded as a result. Welcome to the Internet..