This talk is about:
- Why we made a crowd-testing app
- How we did it
- How it is working for us
Who am I?
Björgvin Reynisson
QA Engineer at CCP Games
Background in mobile phones
Company
- Founded in 1997
- Privately held
- Approx. 350 employees worldwide
- Headquarters: Reykjavik, Iceland
Locations
- Reykjavik, Atlanta, Shanghai and Newcastle
EVE Online
- Massively multiplayer online game (MMO)
- Launched May 2003
- Players in over 190 countries
- Biggest markets: US, Russia, Germany, UK and Canada
Why Crowd-Test your Game Engine?
Why Crowd-Test your Game Engine?
- Testing changes before hitting production
- Catching bugs with this app vs. seeing them on live server
- Increasing Test Coverage
- Performance testing/comparison
Has anyone experienced this?
"We cannot reproduce this..."
Why is reproduction hard?
EVE Players use a LOT of different configurations
- Windows
- Mac
- Linux (although we don't support it)
- 855 different Graphics cards (GPU's)
- Different GPU driver versions
EVE Probe
Works on your machine!
EVE Probe
Repeatable test-scenes that you play back on your computer
Crowd-testing!
Increased coverage
What metrics are collected?
- System Info
- Settings
- EVE Probe version
- Logs
- Performance Metrics
- Crash info
Metrics
- Data is not personally identifiable
- OS, GPU, Processor, RAM
- Monitor Resolution
- Log info, warnings, errors
- Up to 150 different metrics captured per scene
- Primarily monitor Framtime and Memory for performance
- Collect stack trace if a crash occurs
Data Pipeline
Data Pipeline
Initially used R and IPython notebooks for cleaning/processing/visualizing data
IPython Notebook Example
EveProbe Dashboard
Reports
Example:
Performance Metrics
Performance Metrics
Deployment Process
Builds from each branch
Distributing the app
- Soft launch
- Released via a forum post
- Distribution method remains the same
Forum Channel
Delivery
Resources are delivered On-Demand.
Binaries are delivered On-Demand.
Delivery
Download-on-Demand now used for EVE-Online
- Lowered initial install package from 7Gb to 300Mb
- Saved CDN usage
- Very smooth roll-out
Developer Tool
- A test-framework
- Repeatable scenes
Benchmarking Tool
- Setup with recommended spec hardware
- Run Mainline, Staging, Live branches
- Monitor performance across branches
Has it been worth the effort?
Success?
- Download-on-Demand
- Has proven its worth for hard-to-reproduce defects
- Great development tool
- Big potential for benchmarking use
Adoption
Improvements?
- Low adoption
- Should we market the app better?
- Share the app with hardware vendors
- Make benchmarking part of the release process