Some information about the issues on Saturday 8th December 2018

  • Hi I am Rob Stothard Technical Director of Services at Rare, I wanted to give a little insight as to what happened to Sea of Thieves on Saturday 8th December. For those that don't know, at around 8am(GMT) a proportion of players started receiving CinnamonBeard errors when trying to join Sea Of Thieves, players that were in game at that time started to have issues when handing in chests or completing commendations.

    Unfortunately, the issue did not trigger any of our automated alerting and continued for about an hour before we became aware and started to investigate.

    The issue manifested as a failure to find a server during server matchmaking, resulting in a CinnamonBeard error at the client. As we started to investigate, it appeared that all the services involved in Server Matchmaking were working fine and that the problem must be in some other part of the system. The matchmaking process in Sea of Thieves is a two step process, first we use Xbox Live matchmaking to organise players into Crews and then we use our own Server Matchmaking service to find a suitable server for that Crew. Having determined that our own matchmaking was operating normally we enquired of the Xbox Live team if there was an issue with the Live Matchmaking and they also reported that there were no issues.

    Having determined that the specific services involved were all operating fine we started to investigate lower down in our technology stack. For Sea Of Thieves some of our inter service communication uses a Pub/Sub architecture backed by a cluster of RabbitMQ servers as the Message Broker. At around 8am that morning some of the Nodes in the cluster had had an interruption to network connectivity resulting in the cluster becoming split. Essentially, our RabbitMQ servers had split into three discrete clusters, each one not communicating with the other two, what RabbitMQ refers to as a Network Partition or Split Brain.

    Once we discovered the root issue we were able, fairly quickly, to bring the cluster back into a good state and then gradually bring the various affected game services back online.

    There were a lot of things that we have learn't from this incident and that we will take away to correct.

    RabbitMQ can be configured to self heal when a Network Partition occurs, we have now made this change to our RabbitMQ install and it will make it's way to our production environment in the coming weeks.

    To enable us to respond quicker, we will add additional monitoring and alerting to the title. In this particular case the most telling symptom was the failure of players to join the game. This is a metric that we already track but do not alert on, we are now in the process of adding alerts to this metric.

    RabbitMQ itself was indicating very clearly that it was having problems, however, the status of RabbitMQ is not as readily visible to our Ops and Out Of Hours teams as it should be. Therefore we will be making additional information available to the Operations teams to allow them to diagnose issues like this much quicker.

    Unfortunately, due to the nature of the incident there were a range of ways in which players could be impacted, from not being able to get into the game to emblem progress not being recorded correctly. As a result we have decided to award anyone that played during the window an in game compensation when they next play the game.

  • 33
    Posts
    27.1k
    Views
  • @bobbles31 Thanks Rob greatly appreciate the info and update!

  • It's really cool of you guys to elaborate a bit, and to compensate players. Keep up the great work!

  • @bobbles31 Very cool. I'm currently working at a place that is migrating from websphere to AWS and I got zero JavaEE experience in college. I'm hoping to get into the game industry in the future and didn't realize how much my experiences here could translate to that industry :)

  • @bobbles31 Thank you once again for being so open and honest about the issues you had. Fortunately it didn't have any impact on myself, but fair play to Rare for realising it may have to many others, and you are now in control of it. Many thanks Rob, keep up the good work ;)

  • @bobbles31 Thanks for the transparency! Always great to feel like we are kept in the loop!

  • @bobbles31 Thanks for the update. Also an interesting little insight!

    I gather this is why some gold and doubloons dropped in to my account when I logged in just now?

  • @luciansanchez82 No that drop was from me - I heard you needed a quick fix loan... remember 39.9% APR (variable) - I'll be in contact with T's & C's shortly ;)

  • While that all went way over my head aha! What’s that about rabbits? Pirate rabbits maybe? God knows.

    Either way thanks for keeping us informed :D

  • Last thing I want is to be in @j4dio's pocket!

    You won't see penny one from me!

  • @knifelife Yes RabbitsMQ (also known as: Rabbits Myxomatosis Qualified) is confirmed as pet's incoming..... and they bring Cinnamon for a Yuletide warm drink... ;)

  • @knifelife https://media.tenor.com/images/b8cc3152a343ac5ea5723a8edb0c5f45/tenor.gif

  • @knifelife Rabbits confirmed! We will need to feed them carrots on voyages

  • endless carrots for cottontail voyages..lol

  • @IceMan-0007 @J4dio @DuMy2008

    Year of the pirate rabbit confirmed!

    https://goo.gl/images/sbLJ6E

  • @j4dio said in Some information about the issues on Saturday 8th December 2018:

    @knifelife Yes RabbitsMQ (also known as: Rabbits Myxomatosis Qualified) is confirmed as pet's incoming..... and they bring Cinnamon for a Yuletide warm drink... ;)

    Naaah I'm sure it's the code name for fishing! ;)

  • Interesting insight into how the matchmaking process works. Thank you for keeping us updated.

  • @bobbles31 Could you define what the in-game compensation is? Is it a Flat Gold payment or BR Dabbloons, or a combination.

    Also what then Happened to all the message request that didn't get pass thru the cluster?

    Did they even get Logged?

    or

    Where they just droped? and if they were logged did they get deleted once the clusters were recalibrated?

    Also would it not be possible to set a Log for all request manged by the MQ server.

  • @enf0rcer said in Some information about the issues on Saturday 8th December 2018:

    @bobbles31 Could you define what the in-game compensation is? Is it a Flat Gold payment or BR Dabbloons, or a combination.

    Also what then Happened to all the message request that didn't get pass thru the cluster?

    Did they even get Logged?

    or

    Where they just droped? and if they were logged did they get deleted once the clusters were recalibrated?

    Also would it not be possible to set a Log for all request manged by the MQ server.

    Also thanks for the insight. As a former network professional it's nice to get a clear answer as to what happened. I do appreciate the transparency.

  • Thats where the rewards came from. Couldnt figure out last week.
    I logged in to receive:

    +1000 gold
    +5000 gold
    +10 doubloons

    Thanks :D

  • I want to make a suggestion.Get a big crab.It can lift a boat.Running a distance on the water after lifting the boat.We need to hit its huge pliers.Only then can it lay down the boat.When he puts it down, he spits out the blindness from his mouth against the splint.Such a huge crab.It's fun.

  • @yushao945 At sea.Lift up a ship.Then run at a high speed.Giant crabs.

  • As this thread was 1 month old, and revived today, it will now be locked. Please feel free to start a new discussion on this topic!

    A general reminder to all, please avoid reviving threads aged past 30 days, as it is considered a necro, and is against our forum rules

    Bumping Threads
    Bumping threads with content that is not providing additional information to the original post is not permitted. Resurrecting very old threads is also not permitted. A warning will be issued and the thread locked. Ignoring the warning will result in a temporary ban from the Forums and a final warning. If the action continues, a permanent ban from the Forums will be issued.

  • @lady-aijou have a questoi when the update comes out on febuary 6 those of us who have spent hours on their chartor are we going to lose everything and have to start all over thank you

  • @oldies-23 No. All pirate info is saved in the cloud.

  • TURN DOWN THE F-ING KRAKEN!!! My buddy and I are a pretty competent two person crew. We've probably put in 600 hours between us. But it is literally impossible to even escape the Kraken, let alone defeat it. I am game for a challenging battle, but it's f-ing impossible and it's stolen our treasure for the last time. I can't play this game until you turn that garbage down. I don't mind dying to players, but when it's just instant defeat to an NPC... Nope. I'm done. Goodbye.

  • @ableduchess8336 pretty easy to escape or kill Kraken on sloop, tentacles that grab ship let go after 2 shots, you can repair, pump water super easily, you might need more hours played (:

  • @AbleDuchess8336 bro the kraken is super easy to beat, literally cannot believe you are actually saying that lol, i would have expected someone to say make the kraken harder. me and my friends dont run into the kraken as much as i wish, but i dont think it has ever sunk me unless i wanted it to

  • @nintenkid9000 im referring to us having to redownload sea of theives on febuary 6 we load the game up and find all our in game money 0 and all our items gone thank you

  • @bobbles31 Time to switch to Kafka :)

  • @oldies-23 said in Some information about the issues on Saturday 8th December 2018:

    @nintenkid9000 im referring to us having to redownload sea of theives on febuary 6 we load the game up and find all our in game money 0 and all our items gone thank you

    I know. I gave you the answer.

  • Ahoy Everyone,

    As the topic has been answered and this thread is old I will drop anchor on this. If you wish to discuss further on this topic please create a new thread.

    Thanks!

33
Posts
27.1k
Views
28 out of 33