Forums » General
Outage tonight
Sorry about the unexpected downtime this evening. We were doing some server maintenance (physically in the colo), planning to keep it all user-invisible. Unfortunately, a loose ethernet cable brought things down. We had it back up again within a few minutes.
On the upside, our server cluster is now running an upgraded OS, and two new machines have been added, with another three to be brought online within the next couple of weeks. This all adds up to improved stability/scalability for more border-skirmishy type stuff.
On the upside, our server cluster is now running an upgraded OS, and two new machines have been added, with another three to be brought online within the next couple of weeks. This all adds up to improved stability/scalability for more border-skirmishy type stuff.
any of this connected to the sporadic lag issues that affected the BP team last night.
I have no idea. "Sporadic lag issues" is pretty vague. If it was well after we did the maintenance, it was probably just internet weather changes. If it was "during" the maintenance period (which was over by 7:15pm or so), it could have been related.
I mean lag on the server side that cause only 10% of my hits to register.
I know it wasn't just the sector cause it happened to me in B-8 as well and my whole group was experiencing it, and by sporadic i mean it happens from time to time in different sectors.
I know it wasn't just the sector cause it happened to me in B-8 as well and my whole group was experiencing it, and by sporadic i mean it happens from time to time in different sectors.
I had the shots not registering thing happen in B8 last night around 7 ish. PM'd momerath at the time.
I emptied an entire FC batt worth of AGT into an Orun in b8 without a single shot doing damage. I eventually killed the 8-10 oruns, 3 valents, and the transport mining in b8, mostly with flares as 80% of agt shots never did damage. Also very odd was the fact that the bots were there and mining AND they never attacked even when damaged.
I emptied an entire FC batt worth of AGT into an Orun in b8 without a single shot doing damage. I eventually killed the 8-10 oruns, 3 valents, and the transport mining in b8, mostly with flares as 80% of agt shots never did damage. Also very odd was the fact that the bots were there and mining AND they never attacked even when damaged.
I also experienced my shots not doing any damage today. I used megaposis and a HX, and hit artemis collectors up to 50 times without doing damage. Then I could spin 360 degrees and fire on the same collector, this time hitting it fine... Logging out of VO and back in seemed to solve the problem, but it looks like it keeps cropping up. I also notice the sector gets laggy when it happens... This happened between 1 and 3 pm gmt.
Ok, thanks for the reports, but it would help us to know exactly what time these things happen and what sector. That way we can check exactly what machine the sector was spawned on at the time, and what other activity may also have been served from that machine. The more accurate the time report, the better.
We will be looking into it, thanks.
We will be looking into it, thanks.
another lag causing issue is if you use the cargo find plugin and scan for an object with bots in the area your shots will likely not register for a few minutes after the scan. you even hear the hits but there is no damage.
15:19:30 according to my logs, that's when I logged on again after quitting due to the lag. So the problem should be right before that, for some ten minutes I think... After that, everything was fine. I know one of my guildies experienced the same thing, at the same time. the sector we were in was Azek N2. Hope it helps =)
NP
NP
I think I may have fixed the problem. One of our cluster servers may have some sort of hardware issue with its clock when using the TSC (Time Stamp Counter) as a timekeeping source. NTP was unable to keep the machine synchronized, and the clock was jittering wildly when I found it. Changing the OS to source from the less-efficient/accurate i8254 fixed the problem. It's remotely possible that the machine actually overheated, and the BIOS underclocked the CPU, which would result in this kind of issue (TSC is based on the cpu clock freq).. but that would be pretty surprising.
Of our cluster of totally-identical servers, only one had the issue, and it was also running some of these sectors (like Azek N2). The clock desynchronization along with the excessive jitter may have caused either the sector daemon to become confused about when given user-shots were within "ballpark hit-range" of an object, or it may also have caused actual transmit/receive data stalls due to our use of device polling on our cluster ethernet interfaces (realtek chips, not very efficient at interrupts, hence the polling).
In any case, the machine had a whole lot of this, along with the wildly jittering clock:
OnPilotHit: objects not colliding 70.293388 > w12.812753/s7.251325.
Which basically means that the SD did not agree with what the client was reporting as a "hit", and therefore ignored it (an anti-cheat measure).
Anyway, the system clock seems fine now with the i8254 device. At some point I'll add graphs of per-machine ntp clock offset and jitter to our monitoring stuff, so we can see if this behaviour is triggered by anything, or is just a general hardware issue specific to this machine (similar clock problems have been reported, albeit infrequently, by other people with Prescott P4s). Regardless, I'm confident that we can fix it if it crops up again, but it certainly was pretty bizarre.
Let me know if this specific issue crops up again (shooting at something, hitting it, but not damaging it). I don't want general "lag" reports.. that's like asking for the phone book, people tend to blame everything on "lag", which can mean anything from their framerate being poor to the intertubes being full. The above specific case of shooting something and definitively hitting it (repeatedly!), but not damaging it.. that's more useul to me.
Of our cluster of totally-identical servers, only one had the issue, and it was also running some of these sectors (like Azek N2). The clock desynchronization along with the excessive jitter may have caused either the sector daemon to become confused about when given user-shots were within "ballpark hit-range" of an object, or it may also have caused actual transmit/receive data stalls due to our use of device polling on our cluster ethernet interfaces (realtek chips, not very efficient at interrupts, hence the polling).
In any case, the machine had a whole lot of this, along with the wildly jittering clock:
OnPilotHit: objects not colliding 70.293388 > w12.812753/s7.251325.
Which basically means that the SD did not agree with what the client was reporting as a "hit", and therefore ignored it (an anti-cheat measure).
Anyway, the system clock seems fine now with the i8254 device. At some point I'll add graphs of per-machine ntp clock offset and jitter to our monitoring stuff, so we can see if this behaviour is triggered by anything, or is just a general hardware issue specific to this machine (similar clock problems have been reported, albeit infrequently, by other people with Prescott P4s). Regardless, I'm confident that we can fix it if it crops up again, but it certainly was pretty bizarre.
Let me know if this specific issue crops up again (shooting at something, hitting it, but not damaging it). I don't want general "lag" reports.. that's like asking for the phone book, people tend to blame everything on "lag", which can mean anything from their framerate being poor to the intertubes being full. The above specific case of shooting something and definitively hitting it (repeatedly!), but not damaging it.. that's more useul to me.
To be clear, you want the time and sector that this happens in, correct?
Yes, knowing the time and sector helps narrow down exactly what machine is running the sector at that point, and lets me know what sector log to check for additional information.