Using Optimization Modeling for Expected Threat

By Sertalp B. Cay | March 24, 2021

If you know little about football analytics, I can safely assume that you have heard of expected goals (xG). For those who don’t know, xG is a metric showing the probability of a shot to be successful with the information we have, such as position, body part, and defender positions.¹ Another of such metrics is expected assists (xA) which can measure the probability that a pass leads to a goal.

A while ago, Karun Singh wrote a blog post and introduced expected threat (xT) which measures the offensive contribution of any non-shot action (pass, dribble, cross).² The main idea revolves around rewarding individual player actions on how much they contribute offensively using probabilistic outcomes. If a player sends a cross into the box, for example, how much it increases the change of a goal. Such measures play a critical role in evaluation player performances even if the outcome of the attack is not successful.

Most recently, Abhishek Sharma implemented the $\mathrm{xT}$ idea in Julia and shared a Jupyter notebook and data for the analysis. For a while I was looking for a way to bring optimization lens into the mix and finally noticed a nice opportunity.

Before I go into the optimization part, let me start with reviewing the basic idea.

xT Idea and Solution Approach

Suppose we have created a 16 by 12 grid representing the field as follows ³:

Sample actions from 2019/20 data. Click/hover on action to see on the pitch.
Name	Action	From	To

I will use letter $G$ to represent set of all grid pairs in this representation.

Calculating the expected threat consists of two parts:

$$ \mathrm{xT}_{x,y} = \underbrace{s_{x,y} \cdot g_{x,y}}_{\text{shot threat}} + \underbrace{m_{x,y} \cdot \sum_{(z,w) \in G} T_{(x,y)\to (z,w)} \cdot \mathrm{xT}_{z,w} }_{\text{move threat}} $$

This equation defines expected threat at grid $(x,y)$ as the sum of all possible actions and their contribution to scoring probability. By having this nested structure⁴, and we are calculating the threat based on steady-state.⁵

Let me go over each term in here:

$\mathrm{xT}_{x,y}$ is the expected threat of having the ball in grid position $x,y$ (possession value), where $x$ and $y$ are x-axis and y-axis grid numbers, respectively.
$s_{x,y}$ is the historical percentage of taking a shot at grid position $x,y$.
$g_{x,y}$ is the probability of scoring if a shot is taken at grid location $x,y$. We are not differentiating what kind of shot it is, but details can be added based on available data; but think this as an oversimplified xG.
$m_{x,y}$ is the historical percentage of moving the ball. This is the counterpart of $s_{x,y}$ so we assume $s_{x,y} + m_{x,y} = 1$.
Finally, we have the transition matrix $T$ that represents the historical percentage of moving ball from grid $x,y$ to $z,w$ successfully, denoted by $T_{(x,y) \to (z,w)}$.

Karun originally proposed solving these equations using an iterative approach: set $\mathrm{xT}$ to 0 initially, calculate all corresponding values, and repeat until convergence. Since we have $16 \times 12 = 192$ variables and $192$ equalities, it is actually a linear system of equations in the form of: $$ Ax = b $$ where $b$ is the vector of constants coming from shot threat and matrix $A$ is produced by move threat and $\mathrm{xT}$ multipliers. Assuming this is a full-rank matrix, solving this system is as easy as calling A\b in MATLAB.

Base Model for xT

Even though it is an overkill to use optimization modeling to solve the base problem, I would like to do it before moving to the next step. Essentially, the whole thing can be written as

\begin{align} \textrm{minimize: } \; & 0 \\ \text{subject to: } \; & \mathrm{xT}_{x,y} = s_{x,y} \cdot g_{x,y} + m_{x,y} \cdot \sum_{(z,w) \in G} T_{(x,y)\to (z,w)} \cdot \mathrm{xT}_{z,w} \quad \forall (x,y) \in G \end{align}

As you see, we don’t really need an objective function as this is a feasibility problem. There is a unique solution that satisfies all these constraints at the same time, and we reach it once we solve the problem. The model can be written using sasoptpy in Python and can be solved with CBC solver as follows:

model = so.Model(name='xThreatModel')
indices = [(x,y) for x in range(w) for y in range(l)]
xT = model.add_variables(indices, name='xT')
model.add_constraints(
    (xT[x,y] == shots.loc[x,y] * scores.loc[x,y] + moves.loc[x,y] * so.expr_sum(T.loc[(x,y),(z,w)] * xT[z,w] for (z,w) in indices) for (x,y) in indices), name='relation')
model.set_objective(0, name='zero', sense='N')
model.export_mps(filename='export.mps')
command = 'cbc export.mps solve solu solution.txt'
os.system(command)

I used the data Abhishke shared (entire 2019-2020 EPL data) and solved the problem to get expected threat values. Hover/click over the pitch below to display expected threat values and where the threat is getting generated from. Remember, $\mathrm{xT}$ consists of move and shot threat values.

Click/hover on pitch to see xT levels.
	Value
Grid	-
xT	-%
Move Threat	-%
Shot Threat	-%
Move/Shot Ratio	-% / -%

Since we have expected threat ratings calculated for each grid, we can measure any event in data in terms of its contribution to offense. For example, Lindelöf’s pass from grid [4,0] to grid [14,4] changes goal scoring probability from 0.33% to 9.98% (+9.65% increase). Or, Gündogan’s cross from [14,10] to [14,4] changes it from 2.07% to 9.98% (+7.91% increase).

The part I was interested in Karun’s original blog post was the difference in left and right side of the field. It somehow makes sense for a specific team to use one side of the field more effectively, but other than that, I think there is little explanation for the difference between $\mathrm{xT}$ of grid [14,7] (13.18%) and [14,4] (9.98%) besides noise. If we had more samples, numbers probably would get closer and become almost symmetrical. So, this is the point optimization could help with. Assuming we have a noisy sample, we force expected threat levels to be symmetric or balanced and minimize resulting errors.

Handling Noise with Error Minimization

Part 1: Symmetricity

Our first change in the model will be making sure that $\mathrm{xT}$ levels of any symmetrical grids are equal to each other. However, adding these equations to the linear system of equations will make the system infeasible. Therefore, we will benefit from optimization to reduce the errors we will generate. This is similar to how linear regression works, we minimize the total error. For this purpose, we will add error term to each grid position.

There are a few ways to define what total error means (e.g. sum of squared errors), but let us keep the problem simple and assume we are trying to minimize sum of all error terms. In order to measure the magnitude of the total, we need to minimize the absolute values of each error term.

Our error variable for grid $(x,y)$ will be $e_{x,y}$. Now, we can write the objective function as $$ \sum_{(x,y) \in G} |e_{xy}| $$ Absolute value function $|e_{x,y}|$ is not linear, but it consists of two linear expressions: $e_{xy}$ and $-e_{xy}$. We can introduce a new variable to represent absolute value, $a_{xy}$, and can write the problem now as a linear optimization model.

Update (2021-03-26): I have added an extra constraint to keep total error equal to 0.

\begin{align} \textrm{minimize: } \; & \sum_{(x,y) \in G} \textcolor{darkblue}{a_{x,y}} \\ \text{subject to: } \; & \mathrm{xT}_{x,y} + \textcolor{darkblue}{e_{x,y}} = s_{x,y} \cdot g_{x,y} + m_{x,y} \cdot \sum_{(z,w) \in G} T_{(x,y)\to (z,w)} \cdot \mathrm{xT}_{z,w} \quad & \forall (x,y) \in G \\ & a_{xy} \geq e_{xy} & \quad \forall (x,y) \in G \\ & a_{xy} \geq -e_{xy} & \quad \forall (x,y) \in G \\ & \mathrm{xT}_{x,y} = \mathrm{xT}_{x,11-y} \quad & \forall (x,y) \in G \\ & \sum_{(x,y) \in G} e_{xy} = 0 \end{align}

We can write this model using sasoptpy as follows:

model = so.Model(name='xThreatModel_sym')
indices = [(x,y) for x in range(w) for y in range(l)]
xT = model.add_variables(indices, name='xT')
err = model.add_variables(indices, name='error')
err_abs = model.add_variables(indices, name='error_abs', lb=0)
model.add_constraints(
    (xT[x,y] + err[x,y] == shots.loc[x,y] * scores.loc[x,y] + moves.loc[x,y] * so.expr_sum(T.loc[(x,y),(z,w)] * xT[z,w] for (z,w) in indices) for (x,y) in indices), name='relation')
model.add_constraints(
    (err_abs[x,y] >= err[x,y] for (x,y) in indices), name='abs_values1')
model.add_constraints(
    (err_abs[x,y] >= -err[x,y] for (x,y) in indices), name='abs_values2')
model.add_constraints(
    (xT[x,y] == xT[x, l-y-1] for (x,y) in indices), name='symm_con')
model.add_constraint(so.expr_sum(err[x,y] for (x,y) in indices) == 0, name='zero_error_total')
sum_err_abs = so.expr_sum(err_abs[x,y] for (x,y) in indices)
model.set_objective(sum_err_abs, name='total_error', sense='N')
model.export_mps(filename='export.mps')
command = 'cbc export.mps solve solu solution.txt'
os.system(command)

Solving this model gives an objective of 0.146 in 0.06 seconds. So a total of 14.6% error is accumulating when we try to force symmetricity into model. Hover/click on the grid locations to see how $\mathrm{xT}$ values look like:

Click/hover on pitch to see xT levels.
	Value
Grid	-
xT	-%
Error (Adj.) Rate	-%
Move Threat	-%
Shot Threat	-%
Move/Shot Ratio	-% / -%

You will notice that expected threat of grid [14,7] and [14,4] are equal now: 9.84%. We have assigned 2.97% error rate for grid [14,7], meaning we reduced the final expected threat level of this grid by this amount.

Note that, instead of forcing hard symmetricity, we could have identified a range on how much two symmetric grids could be away from each other. For example, we could write $$ \mathrm{xT}_{x,11-y} - r \leq \mathrm{xT}_{x,y} \leq \mathrm{xT}_{x,11-y} + r $$ where $r$ is a relatively small percentage, say 1%. It might be expected to have slightly asymmetrical $\mathrm{xT}$ values due higher number of right-footed players. Regardless, such a correction could be especially handy when working with limited amount of data.

The final issue I will mention here is the weird behavior around grid [15,9] and [15,8]. Notice that, our $\mathrm{xT}$ level drops from 2.88% to 2.75% even though we get closer to the goal. A similar issue exists in base model between grid [13,10] (2.30%) and [14,10] (2.07%). I think this is mainly due to lack of instances that originates at these locations. So, our next step will be enforcing a non-decreasing pattern to have consistent values.

Part 2: Consistency

On top of the symmetrical model, our final model will feature the consistency of $\textrm{xT}$ values. As we get closer to goal, we can make sure that $\textrm{xT}$ values follow a non-decreasing pattern.

One way to satify this consistency is to force expected threat of grid $(x,y)$ to be greater than $(z,y)$ (notice that it is the same row) if $(x,y)$ is closer to the goal. Similarly, we can force expected threat of grid $(x,y)$ to be greater than $(x,w)$ if $(x,y)$ is closer to the goal.

We add two new constraints and get the following model:

\begin{align} \textrm{minimize: } \; & \sum_{(x,y) \in G} a_{x,y} \\ \text{subject to: } \; & ... \\ & \mathrm{xT}_{x,y} \geq \mathrm{xT}_{z,w} \qquad \forall (x,y) \in G, \forall (z,w) \in G: d(x,y) \lt d(z,w) \text{ and } x=z \\ & \mathrm{xT}_{x,y} \geq \mathrm{xT}_{z,w} \qquad \forall (x,y) \in G, \forall (z,w) \in G: d(x,y) \lt d(z,w) \text{ and } y=w \end{align}

where $d_{x,y}$ is the distance of grid $(x,y)$ to the goal.

We can write this model using sasoptpy as follows:

model = so.Model(name='xThreatModel_sym_inc')
indices = [(x,y) for x in range(w) for y in range(l)]
xT = model.add_variables(indices, name='xT')
err = model.add_variables(indices, name='error')
err_abs = model.add_variables(indices, name='error_abs', lb=0)
model.add_constraints(
    (xT[x,y] + err[x,y] == shots.loc[x,y] * scores.loc[x,y] + moves.loc[x,y] * so.expr_sum(T.loc[(x,y),(z,w)] * xT[z,w] for (z,w) in indices) for (x,y) in indices), name='relation')
model.add_constraints(
    (err_abs[x,y] >= err[x,y] for (x,y) in indices), name='abs_values1')
model.add_constraints(
    (err_abs[x,y] >= -err[x,y] for (x,y) in indices), name='abs_values2')
model.add_constraints(
    (xT[x,y] == xT[x, l-y-1] for (x,y) in indices), name='symm_con')
model.add_constraint(so.expr_sum(err[x,y] for (x,y) in indices) == 0, name='zero_error_total')
model.add_constraints(
    (xT[x,y] >= xT[z, w] for (x,y) in indices for (z,w) in indices if dist(x,y) < dist(z,w) and x==z), name='same_row')
model.add_constraints(
    (xT[x,y] >= xT[z, w] for (x,y) in indices for (z,w) in indices if dist(x,y) < dist(z,w) and y==w), name='same_col')
sum_err_abs = so.expr_sum(err_abs[x,y] for (x,y) in indices)
model.set_objective(sum_err_abs, name='total_error', sense='N')
model.export_mps(filename='export.mps')
command = 'cbc export.mps presolve off solve solu solution.txt'
os.system(command)

This problem takes less than a second to solve with an objective of 15.4% error in total.

Finally, we get our final $\mathrm{xT}$ values:

Click/hover on pitch to see xT levels.
	Value
Grid	-
xT	-%
Error (Adj.) Rate	-%
Move Threat	-%
Shot Threat	-%
Move/Shot Ratio	-% / -%

Now expected threat values for

[15,9] is 3.02% (2.68% in base model, 3.02% in symmetric model)
[15,8] is 4.02% (2.91% in base model, 3.94% in symmetric model)

and their symmetric counterparts:

[15,2] is 3.02% (3.10% in base model, 3.05% in symmetric model)
[15,3] is 4.02% (3.96% in base model, 3.94% in symmetric model)

Comparison and Recap

You can find all 3 values of $\mathrm{xT}$ values below. Hover/click on a grid to see the corresponding levels.

Click/hover on pitch to see xT levels.
	Value
Grid	-
xT (Base)	-%
xT (Symm)	-%
xT (ND-Symm)	-%

Now, we have a perfectly symmetrical and non-decreasing model on our hands. Our final model manages to fix both issues we tackled:

Possession of the ball on grids [14,4] and [14,7] are equal to each other
Passing from grid [13,10] to [14,10] does not decrease threat level

So, to recap, expected threat is a method for assessing the value of having the ball possession for attacking team in pitch locations. The detail of the data dictates how accurate the model can get. When working with limited data, it is important to reduce the noise. We can use optimization modeling to ensure the final model is valid in terms of defined performance measures, like symmetricity. This is a simplified take on how to include optimization into these type of analysis. I hope it is inspiring in terms of tackling similar issues.

If you were wondering here are the contribution of each moves that is shown at the beginning of the post:

Name	Action	From	To	xT Contr. (Base)	xT Contr. (ND-Symm)

Compared to the base model, we identified that Mané’s dribble might have a higher importance (0.61% to 1.80%). Contribution of Traoré’s pass, on the other hand, decreases slightly from 2.93% to 2.19%.

Future Work

One thing I mentioned but didn’t cover in details is how the total error is defined. When using sum of squared errors, the problem becomes an Quadratic Optimization instance and requires a different solver to solve. We could take a further step and generalize the problem by trying to minimize the p-norm as $$ \text{minimize: } \Vert e \Vert_{p} := \left ( \sum_{(x,y) \in G} |e_{x,y}|^{p} \right )^{1/p} $$ In this post I used $p=1$ which gives absolute value function, and similarly we could have tried to minimize the maximum error using $p=\infty$.

A totally different direction for involving the optimization is generating $\mathrm{xT}$ values per team, and analyze/find best n-step plays. This obviously requires more data, but assuming some grid connections are blocked by opponent defenders, optimization can find which play maximizes $\mathrm{xT}$ accumulation over next n moves.

If you are interested to work on any of these problems or have another idea, please reach out to me. You can find my Jupyter Notebook (Python) on GitHub that includes data steps and all models I have generated here.

FBref has a page on xG if you would like to learn more about it. ↩︎
Note that, similar (possession value) models are available in market, and recently discussed in StatsBomb’s 360 reveal. ↩︎
Field drawing is based on this Blocks page ↩︎
Here $\mathrm{xT}_{x,y}$ depends on $\mathrm{xT}_{z,w}$ and vice versa. Therefore, we have a nested structure. ↩︎
We can think this as a Markov chain and $\mathrm{xT}$ as the steady-state probabilities. ↩︎