Improving Playing Style Model

The Model

  1. Bottom-up Model: Identifies playing style similarity between the player’s current club and potential new clubs using K-nearest neighbors (KNN) algorithm.
  2. Improvements: Enhancements involved dimension reduction with PCA, integrating scheme features, and including player data.

Process Steps

  1. Dimension Reduction with PCA: Reducing the feature set for easier calculation.
  2. Integrating Scheme Features: Analyzing and clustering clubs based on formations (4-4-2, 4-3-3, etc.).
  3. Adding Player Data: Incorporating player statistics and ratios, normalizing data, and matching player’s style to clubs.
  4. Combining Results: Weighted combination of clubs based on similarity to the player’s current club and playing style.
  5. Level Filter: Adjusting for differences in club levels based on the player’s level

Complications Addressed

  1. Players Not Currently Playing: Handling cases where a player may be inactive or not playing.
  2. Insufficient Player or Club Data: Dealing with cases where there might not be enough data for players or clubs.
  3. Unavailable Player Level Information: Adjusting the model for leagues where player levels aren’t available.

Testing the Model

Improving Playing Level Model

Case Study: Optimizing Player Transfer using Top-Down Model

Background: a promising young football player, desires to transfer to a new club to further his career. Using the Top-Down Model, an analysis will be conducted to predict the ideal average team player level and club level the player transfer.

Steps Taken

1. Player Level and Club Level Assessment

  • Player Level: The Player Level analysis assesses X’s  impact on the team’s winning odds during his time on the pitch, corrected for various factors. This objective assessment measures his contribution to the team’s performance.
  • Club Level: Evaluates football teams’ relative playing strengths based on match results in national leagues, cup competitions, UEFA Champions League, and UEFA Europa League.

2. Data Collection and Model Building

  • Utilizing the database encompassing transfers from the past 15 years, including player movements, their Player Level at the time of transfer, Club Level of the destination club, and subsequent Player Level after the transfer.
  • The model aims to predict the optimal average team player level a player should transition to. Successful transfers (where a player’s level increased post-transfer) serve as the foundation to train the model.

3. Algorithm Utilized: Gradient Boosting Regressor

  • Gradient boosting, a machine learning technique, is employed. It combines weak prediction models iteratively to form a robust prediction model. This method minimizes residuals (prediction errors) and enhances the prediction accuracy by adding weak models.

4. Model Limitations and Improvements

  • Identified model limitations include neglecting club level in predictions and sensitivity to overfitting.
  • Proposed improvements involve integrating club level into predictions and utilizing XGBoost (Extreme Gradient Boosting) for enhanced speed, performance, and regularization techniques to prevent overfitting.
  • Added a dataframe to know what will be the rank of the player in his possible future club to make sure that he will be in the top 9 players in his next team

5. Testing the Model