Forum C# Concerns

Concerns

Postby azuanagames » January 21st, 2011, 6:58 pm

There are a few things that are worrying me lately. I've been getting higher number of plays and am seeing more errors:

1) Webservice returned unhandled error: InternalError
First seen 7 days ago, latest 6 hours ago. (delete 28 errors)
The underlying connection was closed: An unexpected error occurred on a send.

2) Unable to preload PlayerObject in Server
First seen yesterday, latest yesterday. (delete 1 error)
UserId: fbXXXXXXXXXX, Error Message:An unexpected error occurred inside the Player.IO webservice. Please try again.

3) Code for event Timer ran for too long
First seen 10 days ago, latest 6 days ago. (delete 13 errors)
4118.4264ms. The maximum runtime is 100ms

4) Object reference not set to an instance of an object.
First seen 11 days ago, latest 2 days ago. (delete 12 errors)

I realize #4 is somehow my issue. Though I can not track it down for the life of me. #3 might be me, but honestly I can't reproduce it, and for a timer to run for 4 seconds is outrageous... I don't have anything that can possibly be causing that...

I realize that I can now run more of this on the development server, and am in the process of setting that up. But as you can see above these errors aren't happening consistently...

Does everyone see these kinds of errors in their logs now and then? Are there more defensive programming approaches I should be taking? Try / Catches that I'm missing?
azuanagames
 
Posts: 157
Joined: April 29th, 2010, 10:59 pm

Re: Concerns

Postby Oliver » January 24th, 2011, 12:05 pm

Hey Azuana,


Those are very valid concerns, and i'm going to give you an answer that it

With regards to #1 and #2: Those are both intermittent errors caused by something failing in our backend: Either a bug or a component failure. They are not your fault. With any larger distributed system, there are going to be components failing from time time to time, and bugs pop'ing up once in a while.

Naturally we log and track those errors, and it's pretty much our top concern to get those error rates as low as possible, but once in a while they will happen. We could choose to not expose those errors to you, but to ensure transparency we want you to have complete insight into the health and well being of your game.

So, you should obviously be concerned that the errors occurred, but you should also feel secure in knowing that general errors on our end are tracked and fixed as fast as humanly possible. It's in our best interest to have as low error rates as possible otherwise developers won't use our systems.

#3 is actually a special case of this. We're currently experiencing pretty big growth rates, and that's uncovering some new issues that we didn't know we had, that we have to deal with. Recently, we saw that our system for capping cpu time was sub-optimal in heavy-load cases. The thing is, one game server in our main cluster will run rooms from many different games, and usually that is not an issue. But if load goes really high on one server (we've recently have to tweek our load distribution method for something similar), that might cause some code to unfairly get caught by our cpu timing system.

We deployed a fix for this last friday, that should make the measurements much, much more correct going forward. It might not solve everything, but it's a step forward.

Of course, it might also be the case that you've got some weird logic that actually takes a long time to run, but i'm going to assume for this discussion that it's not the case.

In short: Player.IO has bugs -- as has any other software. However, the great thing about Player.IO vs. rolling your own, is that you know that there is a team of people who are constantly working hard to eliminate those bugs and improve the overall performance of the system, while you get to concentrate on gameplay.

That's our value proposition: We take care of the backend, you get to concentrate on the gameplay. So our interests are totally aligned with yours: We want as few bugs/failures in our system as possible, because that strengthens our value proposition.

I hope this helps. I know that you'd probably prefer that I'd promise that it was totally unique errors that will never happen again, and from here on in Player.IO will have 100% uptime with zero defects (and so would i), but we're not there yet. However, that's where we want to go, and that's what we're working hard on.

Best.
Oliver


Best,
Oliver
User avatar
Oliver
.IO
 
Posts: 1159
Joined: January 12th, 2010, 8:29 am

Re: Concerns

Postby azuanagames » January 24th, 2011, 3:41 pm

Thank you Oliver.

I'm pretty sure that I've optimized all the code that runs in Timers on my end. So it's good to hear your tracking down issues on your end.

Keep up the good work and thank you for taking the time to answer my inquiry :)
azuanagames
 
Posts: 157
Joined: April 29th, 2010, 10:59 pm


Return to C#



cron