“Life, like poker, has an element of risk. It shouldn’t be avoided. It should be faced.”

– Edward Norton

Poker is a card game involving math, skill, and luck. Poker players regularly use artificial intelligence to practice and improve their game play. In this post, we will build a poker bot to mirror the decisions of one online player (Ilxxxll). We can use this analysis to predict player tendencies and potentially exploit his or her weaknesses.

This analysis leverages linear regression, multilayer perceptron (MLP) regression, and seaborn visualizations. The full code can be found on my GitHub.

Based on my analysis, IlxxxlI mainly plays 5-6 handed tables. IlxxxlI plays more conservatively with more players at the table and more aggressively with high cards or poker pairs. Additionally, neural networks improve the bot's accuracy as compared to a standard linear regression.

### Data Sources

This player published over 50,000 hands of Texas Hold 'Em in a txt format. For the record, it is never a good idea to detail your hand history publicly. After an Extract, Transform, Load (ETL) process, the database of hands was used to predict the player's preflop decisions. I identified 10 features to evaluate:

-Value 1 - The value of the player's first card (e.g. 2 = 2, Ace =14).

-Value 2 - The value of the player's second card.

-Player Suits - If the player's cards are the same suit, the value is 2. Else, the value is 1.

-Amount to Call - The size of the bet the player is facing.

-Pot Size - The amount of money the player can win.

-Position - The location of the player relative to others (e.g. small blind is 0, dealer is 5).

-Number of Players - The total number of players sitting at the table.

-Active - The number of times players have voluntarily invested money in the hand.

-Remaining Pre Action - The amount of money a player has remaining.

-Big Blind - Proxy for the monetary "size" of the game.

There are four, distinct classes (labels) in this model. Each class represents the four distinct decisions available to poker players:

1) Raise – Betting to increase the wager

2) Call – Agreeing to match another player’s wager

3) Check – A no bet option

4) Fold – Discontinuing the current hand

### Analysis

For initial data explorations, I developed two bubble charts.

The first bubble chart provides an overview of how a player reacts in various game sizes and positions.

-X.axis - The number of players in the game

-Y.axis - The location of the player relative to others (e.g. small blind is 0, dealer is 5)

-Bubble size - The number of hand instances in the dataset

-Color - Player actions (yellow is likely to raise, dark purple is likely to fold)

From this chart, I found three key insights:

1) IlxxxlI mainly plays 5-6 player games as evidenced by bubble size

2) When IlxxxlI is in position 2, he or she is most likely to fold. The darkest dots are in the 2 position for 4-6 player games.

3) IlxxxlI raises more when there are fewer players at the table. The bubble colors become lighter when there are fewer players.

The second bubble chart provides an overview of how a player reacts given various card combinations.

-X.axis - The value of the first card

-Y.axis - The value of the second card

-Bubble size - The number of hand instances in the dataset

-Color - Player actions (yellow is likely to raise, dark purple is likely to fold)

From the second bubble chart, I found two additional insights:

1) IlxxxlI raises more when he or she has high cards or pocket pairs.

2) IlxxxlI likes to slow play Aces. When IlxxxlI has two Aces, the color is darker than for two Kings or Ace-King.

Now that we understand the types and ranges of available features, we turn our focus to predicting player decisions. We compare the effectiveness of neural networks against a simple linear regression model. Neural networks support more complex, non-linear relationships between variables and should outperform linear regression models in terms of accuracy.

```
# A simple linear regression model serves as our base case
lir = LinearRegression()
lir.fit(train_x, train_y)
predictions = lir.predict(test_x)
# We can improve upon the linear regression with a neural network
mlp = MLPRegressor(hidden_layer_sizes=(100, 100), activation='logistic', max_iter=2000)
mlp.fit(train_x, train_y)
predictions = mlp.predict(test_x)
```

We have over 10,000 samples in our test set. To analyze the results of each model, I generated confusion matrices. For the following two tables, correct predictions are highlighted in green.

The linear regression model had an overall accuracy of 81.3% as measured by the Jaccard similarity index. While the test set contained over 700 instances of "calls" and "checks", the linear regression model never predicted that outcome.

At 91.3%, the neural network accuracy was 10% higher than the linear regression model. When the player folds in the test set, the model's recall to correctly predict that action is 7,913/8,132 = 97.3%. The worst recall scores are when the player calls. The call action's recall was 45.2%. From the confusion matrix above, we can see that the neural network bot raises and folds more often than the actual modeled players. It seems that the real life players are more deceptive and play less straightforwardly than this neural network-based bot.

### Conclusions and Potential Next Steps

Neural networks outperformed simple linear regression in terms of prediction accuracy for poker bots. The neural network bot correctly predicts the player's decision 91.3% of the time.

Additional research ideas:

-Optimizing the number of neurons in the neural network. (100, 100) seems overkill.

-Playing the bot against itself and recording the results

-Analyzing the profit impact of the player's decisions -Expanding the existing model to postflop play