RoundGames is a real-time quiz game platform. It's used by many high-reach streamers on Twitch to play with their viewers. It has hosted games with more than 35k simultaneous players responding to each question in 5 seconds. The game logic is simple and accessible by any new players very quickly. This makes RoundGames the ideal platform to host a live contest.
I co-founded and am the CEO of RoundGames. I'm responsible for the product and my partner manages the commercial side of the company. The product and commercial pipe being in a stable state, RoundGames requires very little time to maintain while producing recurring income with existing and new customers.
The main goals I set when starting working on RoundGames were:
- Scalable: The platform should host at least 70k simultaneous users while being highly optimized
- Stable: I should be confident enough in the platform to not have to monitor it manually for each big event
- Fast to iterate: Because RoundGames is a side-project, my available time is limited and the development process should be optimized for release speed
For these reasons, I built RoundGames with:
- PostgreSQL: Reliable, stable, adapted to the data shape and size expected
- NodeJS: Adapted to real-time and scale, lot of existing experience with it
- Postgraphile: Enable very fast development of the API with GraphQL, secure, fast
- uWebSockets.js: highly-optimized hot-path for real-time interactions with users while using NodeJS
- Redis: source-of-truth for user answers, optimized Lua scripts used to update the scoreboard
- Cloudflare: security against DDoS, cache scoreboard, and images that are accessed in burst by all players
- React: Adapted to the dynamic nature of the app, lot of experience
- Apollo: Interface with the GraphQL API
RoundGames is hosted on a single $70/month server in Europe. There is no need for a cloud provider except for S3 for user-uploaded assets. I chose the simplicity of a single server over the theoretical security of a multi-server deployment because of the project time budget. A single server deployment also makes monitoring simpler and overall predictability way higher. The small risk of a data-center interruption of service is acceptable in this usage. Not using a cloud service also highly reduces the hosting fees.
Challenge: Serving scoreboards
Each player should be able to consult their rank in the global and last round scoreboard. They should also be able to scroll through the scores to find their friends or persons of interest. Two main strategies were explored for this:
- Lazy-loading the scoreboard: After each round, the server sends the user its new rank and the client is responsible to request the interesting part of the scoreboard.
- Lowest bandwidth: only send what's displayed
- High CPU usage: need to generate a scoreboard slice for every player
- High complexity: need to implement lazy-loading of the scoreboard on scroll, searching for a player is complex
- Send the full scoreboard: After each round, the server sends the full scoreboard to every user, and the client code uses this data to get the player rank
- Highest bandwidth: need to send the full scoreboard to every player
- Low CPU usage: same message broadcasted to every player in the game
- Low complexity: easy to implement client-side scoreboard exploring / searching
The decision was made to send the full scoreboard but with a trick: the scoreboard is not sent through the websocket but with a separate HTTP request. This request is cached by Cloudflare, allowing a very high burst of bandwidth with a minimal impact on the server. After each round, a new UUID is generated and used to name a new binary file containing the scoreboard. This file name is then broadcasted to every player through the websocket.
This scales well to ~100k players in the same game. For each player, the scoreboard file contains:
- the player UUID: 16 bytes
- game score: 2 bytes
- last question time-to-response: 2 bytes
- last round score: 2 bytes Resulting to 22 bytes per player. For 100k players this means 2.2MB raw. This scoreboard is also compressed, this is optimized by carefully ordering each field in the resulting file: UUIDs being random they are not compressible but scores and time-to-response are more compressible.
This strategy works very well in production, and keep code complexity and server-load low.