To say it has been a difficult 2 weeks would be an understatement. I’ve been making PC games for about 15 years and never in that time have I encountered a situation quite like what we’ve had with Demigod.
As most of you know, Demigod is an awesome game. If I may be so bold, Demigod is one of the best games that has been released in the last couple of years. And yet, here we are, 2 weeks after release and the included multiplayer matchmaking is still flakey.
I would put it like this (based on logs and talking to people)
- 30% of users think the multiplayer matchmaking works flawlessly. You don’t hear much from them (obviously).
- 60% of users are finding the multiplayer matchmaking to be flakey. It works but it’s a pain in the ass to get into a game sometimes.
- 10% of users cannot be connected to other users period.
That last number will go up if the proxy servers aren’t put in place prior to the official European release as that last 10% is almost exclusively made up of people using ADSL in Europe.
So how did this happen?
Losing Messages
The single biggest problem has to do with lost messages. In the beta, when we had only a couple thousand people using the system, it worked pretty well. At first, it was pretty flakey but as we added more servers to handle the connections, the success rate went way up and by the end it was pretty perfect. To be safe, we added extra servers to handle the expected extra players.
Then, one retailer broke the street date over the Easter break. This in itself wasn’t as big of a problem as one might think as the system could handle the legitimate players. But when the game inevitably got into warez channels, the pirates saturated our other servers (not the NAT server). Those servers handled things like validation and checking for updates. The validation kept pirates from being able to play in the Pantheon, skirmish or custom games but it also ate up a lot of our extra capacity. The result was that for the first 3 days after the official release of the game, the online experience was totally hosed. The early reviews reflected that unfortunately.
So for the first week, rather than focusing on the issue that would come to bite legitimate players for this past week, we were scrambling to create a mirror series of servers to isolate the pirates from the legitimate users. That took us into mid last week.
By mid last week, it became apparent that the something was causing players to not get into the game. We assumed it was simply bugs in Impulse Reactor since it was new and on Friday released a new version of it. But it turned out there was nothing wrong with Impulse Reactor, the problem was coming from the NAT facilitator itself – it was sending messages but the game was never receiving them – or at least not before the time out period.
Adding more servers over the past few days has improved things bit by bit but not acceptably.
More Specific
When Bill and Rick decide to play Demigod together. The NAT facilitator connects them together, sends a message to Impulse Reactor which in turn hands over the socket to Demigod’s lobby. We have verified that the NAT facilitator is sending the message but the message is never being received. We know it has something to do with the load because when we try it out with a relatively small group of testers (~100) it works flawlessly. It’s when it is put out there for tens of thousands of people that it falls apart.
Solving it
And that is where we are on Tuesday with regards to the connectivity. Figuring out why the message is sent from the NAT but not received by the game. There’s a lot of thoughts on this. UDP (unlike TCP) isn’t guaranteed. You can learn more on this here. Another thought is that the NAT queues up its messages and that by the time it sends it back to the game, it has timed out. The team tells me they think they’ll have this mystery solved today. If so, I’ll report here on this. But this will put us up to 90%.
Proxy Servers
If I had a time machine and knew about the pain we would all be experiencing in the first couple of weeks after release, I would have simply had us set up a series of host servers around the world. When someone started a game, they would simply connect to one of those servers and there’s be no connection issues. That’s what we have been planning to do with Elemental. But I was of the same mind that many here have said “Hey, lots of games do this, how hard can it be?”
Well, it turns out, it is very hard. Most vendors have used GameSpy over the years. Blizzard developed Battle.net over many years. Relic migrated to a system it developed with Quazal to develop.
The only reason we should be able to develop a Proxy system quickly is because we already have a worldwide network infrastructure because of Impulse so the hardware is already there, the agreements with ISPs in various countries already in place. Work on this system has been going on in parallel with the NAT issue. What will happen is rather than getting a NAT failed message, the user will connect to one of our servers and have their traffic routed through that. I don’t have a good ETA other than (very soon – days not weeks we expect).
On a personal note
We read the forums and we hang out on the chat and we even read comments elsewhere when we’re at home. First, we appreciate all the support and understanding our community has had. To those who are experiencing problems, we share your frustrations. Many of us at Stardock and Gas Powered Games have spent an immense amount of time making Demigod a great game. For those of you who are parents, I suspect you can empathize with what those of us with small children at home who have gone weeks without seeing their dad (or mom). As a game designer, it has been quite maddening to see the launch of a great game marred by IT and network stuff. And I can tell you, getting this network stuff nailed down is easily as complicated as dealing with all the various video cards and sound cards and other typical PC pain.
The teams at Stardock and GPG do believe those issues will be solved soon. And we thank you for your continued patience.
WEDNESDAY UPDATE
I am currently sitting in the lab with the GPG, Stardock, and Raknet teams interacting. GPG feels good about having taken care of the in-game disconnect issue. The "big" connectivity issue as described above seems to increasingly appear to be related to having to hand off the socket to the Demigod client. This creates a series of timing issues that gets exponentially worse the more people trying to connect multiplied by the # of users on the NAT server.