How do Chromatic Orbs work?

Let's start first with some facts about chromatic orbs:

This isn't a lot to work with. We don't, for instance, have confirmation that other factors like item level or item type don't play a role in the process. But let's assume for now that stat requirements are all that matter. If this is true, how exactly does a stat requirement influence the color of an item?

Stat requirements and rolling colors

One clue would be to look at how other orbs are rolled. During closed beta, Chris confirmed that for jewelers, the probability of rolling a specific number of sockets can be represented by an integer value. So if 6 sockets is represented by 1 and all other sockets are represented by 305, the probability of a 6 socket is 1 out of 306.

Applying this logic to chromatic orbs, I proposed the following: Every time you roll a chromatic orb, the color of each socket is rerolled independently of the other sockets. There's an integer weight for each color, one each for red, green, and blue. That weight is the stat requirement plus some number (we'll call it "X"). We need the variable X because even though a pure strength item has zero DEX requirement, it can still roll green sockets, they're just less likely than red ones.

This means that the probability of getting a specific color socket on an item is a simple formula:

(STAT + X) / (STR + DEX + INT + 3X)

This takes care of the relationship between an item's stat requirements and the probability of rolling specific colors.

Checking for duplicate rolls

But what about not allowing the same colors to appear? To take care of this, a check is performed after all the rolls occur. If we ended up with the exact same item, we just repeat the process. This type of rejection sampling is inefficient, but should provide us with the same end result as whatever (hopefully more efficient) process GGG is actually using.

Determining the Value of X

OK, so how do we figure out what X is? First we needed to collect some data. For that purpose, I opened up a Community Log where people could enter in data. Right now we're sitting around 1600 chromatic orbs used! That might not seem like a lot, especially when you think about how 1600 orbs isn't necessarily enough to get a good 6 off-color Shavs. But because our estimation task is simple (there's only one variable, X, to learn) and because the value of X plays a role in the process multiple times per roll, this turns out to be more than enough to get a rough estimate. More data, however, is of course always better.

If you're interested in details about how I have estimated X, I suggest you take a look at my forum post. Basically, I've used a Metropolis-Hastings algorithm, which although inefficient, gives me a probability distribution over X rather than a point value (also I know how to code them up easily...).

So how do we know this fits the data?

The model is extremely simple, but it's not of any use if it doesn't actually explain the data that we've collected so far.

The way I've chosen to address this issue is using a statistical test called the Pearson's Chi-Squared test. We'll use that to measure the goodness of fit between the model's predictions and the actual empirical data. Essentially, if the test is significant, the model's predictions are wrong, and if it isn't significant then we can't conclude the model is wrong. Below are three model predictions that I've tested.

Total number of sockets of each color

Here we measure how many red, green, and blue sockets occur in the data we've collected. We then see how many we would have expected if the statistical model was true. Because there's a lot of randomness involved in rolling chromatics, we run the model 5000 times (for X=12) and then average the results. We find that the expected results are not significantly different than the observed results (p=0.9168, chi2=0.1736, df=2).

Total # of Sockets
Red Green Blue
Observed 1099 4488 2477
Expected 1091.9 4478.7 2493.4

Number of different colors per item

Another prediction of the model is how many items have only a single color, versus two different colors or three different colors (i.e. how many items are pure red, versus only red or blue, versus having all three colors at once). We simulate the data in the same manner as before and tally the results up. Again, we find no significant difference so we've again failed to rule out the model (p=0.3426,chi2=2.1422,df=2).

Items with N unique colors
One Two Three
Observed 421 1120 306
Expected 420.2 1142.7 284.1

Number of sockets of each color per item

Here we measure for every item how many red, green, or blue sockets did it have. From that we can tally how likely is it that an item in our dataset would roll 4 blue sockets, for instance. Again, this is simulated as before, and comes out again non-significant (p=0.7783, chi2=4.8038, df=8). Note that because the chi2 test is not robust when there are multiple rare outcomes. Since in our corpus items with 6 of any color are very rare, this would throw off the results. Therefore, we add together all cells that have 4 or more of a specific color.

Number of sockets for each color
Observed Expected
Red Green Blue Red Green Blue
Zero 1027 262 673 1037.3 259.9 685.9
One 611 255 491 595.5 256.9 489.3
Two 152 333 294 158.7 344.6 269.2
Three 44 528 215 44.3 525.7 219.1
Four+ 13 469 174 11.1 459.9 183.4

So, not perfect but pretty good. Certainly, the model even with just a single parameter is able to account for the data we've collected so far without any real trouble. Of course, we can always use more data, so feel free to submit your own using the link in the footer!

Other Notes

All calculations are made as exactly as possible. Mean calculations are made based on an absorbing Markov chain (Thanks to MantisPrayingMantis for pointing this out). The median and '% after NChr' calculations are made exactly so long as the result is less than 5000 chromatic orbs. At some point, it makes sense to stop calculating each chromatic orb exactly, and just start estimating. Note that the estimates tend to be slightly more optimistic than they should be. So, for instance, if the median displayed is 9000 orbs, in reality it's probably a little higher, maybe 9050 (yes, it's over 9000). Likewise for the '% after NChr' calculations, if you ask for the probability after 9000 chromatics, the percentage listed is too optimistic, so if it says 50%, it's probably slightly lower, maybe 49.8%. Even though the estimated probability is off by only a small amount, when you start multiplying that error a few hundred/thousand times it does add up.