Predict who will win a soccer match using tigergraph, pandas, and GSQL part 3/3

Sarthak Agrawal
6 min readFeb 21, 2021

This is the final part of the tutorial for predicting the winner of a soccer match. If you missed the other two parts, here are the links:

Part 1: https://sarthavader.medium.com/predict-who-will-win-a-soccer-match-using-tigergraph-pandas-and-gsql-part-1-3-65c2fef665e

Part 2: https://sarthavader.medium.com/predict-who-will-win-a-soccer-match-using-tigergraph-pandas-and-gsql-part-2-3-2196ecd3673a

Anyways, in this tutorial we’ll be designing and writing queries for our graph database to do the predicting part. We’ll be exploring how to use relationships between teams and matches to find indirect wins—team1 beats team2 who beats team3—to find the predicted winner. We’ll start by explaining the methodology of the the GSQL language here.

In TigerGraph, nodes are called vertices, and the relationship or connection between two nodes is called an edge.

Our graph structure is very simple. It simple consists of two vertices connect by a single edge. This can be represented in GSQL by the following:

vertex1 -(edge1) ->vertex2

In our case vertex1 is our team vertex, edge1 is our had_match edge, and vetex2 is our match vertex. Replacing them gives this:

team -(had_match) ->(match)

GSQL is very similar to SQL and its easy-to-read syntax. For example, if we wanted to write a query that returns all the matches in the year 2015 we could run the following query.

CREATE QUERY FilterMatches(int year) FOR GRAPH SoccerMatches { 
Start = {match.*};
Result = SELECT t FROM Start:t WHERE t.year==year;
PRINT Result.size();
}

In this query, we are selecting the matches from all the matches to find the ones where the year is whatever we specified. Then we are printing out the number of matches found that fulfill that requirement. In addition, we could add another parameter to this. For example, we could add the parameter isNeutral to this and filter based on that. See if you can understand the code below.

CREATE QUERY FilterMatches(int year, bool isNeutral) FOR GRAPH SoccerMatches { 
Start = {match.*};
Result = SELECT t FROM Start:t WHERE t.year==year AND t.isNeutral==isNeutral;
PRINT Result.size();
}

We could also just print out Result, and that would print the json string.

In addition, we could also see the data like a graph by clicking the visualize button on the left just above the curly brace button (for the json result).

Now that we understand the basic syntax of GSQL, let’s look at some basic hopping between vertices.

Let’s take a look at the SELECT statement below:

all_teams = {team.*);
Start = SELECT t FROM all_teams:t WHERE t.id == "Argentina";
Result = SELECT m FROM Start:t -(had_match:hm) ->match:m;

Essentially, this code follows the graph structure. It starts at the variable Start (which is the team Argentina) and goes to every all the had_match edges it has. Then it returns all the match vertices that has the had_match relationship with the team vertex.

We can do additional filtering here as well. For example, we can take only the matches that were played in a 2000 by Argentina. Try to understand the code below.

all_teams = {team.*);
Start = SELECT t FROM all_teams:t WHERE t.id == "Argentina";
Result = SELECT m FROM Start:t -(had_match:hm) ->match:m WHERE m.year == 2000;

Now we can select only the matches that Argentina won in the year 2000 with the following query:

all_teams = {team.*};
Start = SELECT t FROM all_teams:t WHERE t.id == "Argentina";
Result = SELECT m FROM Start:t -(had_match:hm) ->match:m WHERE m.year == 2000 AND m.winner=="Argentina";

And now let’s go from these matches to the teams that lost.

teams = SELECT t FROM Result:r -(had_match:hm) ->team:t WHERE t.id != "Argentina";

We’ve just written a query that gets all the teams that Argentina beat in the year 2000. Now let’s put it together so we can run it.

In the Write Queries tab in Graph Studio, create a new query and call it Beat_Teams. Remember, our team id is just the name of the team!

CREATE QUERY Beat_Teams(string team_id, int year) FOR GRAPH SoccerMatches { 
all_teams = {team.*};
Start = SELECT t FROM all_teams:t WHERE t.id == team_id;
Result = SELECT m FROM Start:t -(had_match:hm) ->match:m WHERE m.year == year AND m.winner==team_id;
teams = SELECT t FROM Result:r -(had_match:hm) ->team:t WHERE t.id != team_id;
PRINT teams;
}

Running it with Germany and the year 2000 should return the following the visualization.

Now that we can find the teams that a specific team has defeated in a given year, we should be able to find the teams that were defeated by those teams.

We can look for a specific team that is indirectly defeated. For example, assuming the rest of the querying in the above example has already happened and we have a variable called teams:

matches = SELECT m FROM teams:t -(had_match:hm) ->match:m WHERE (m.year==year AND m.winner==t.id) AND (m.home_team=="Germany" OR m.away_team=="Germany");

In this case, we are finding all the matches where the loser in Germany, assuming that none of the teams we start with are Germany. If we start with Argentina, we can use this to find all the matches where Argentina indirectly defeated Germany in a given year.

Here a query that we can test out.

CREATE QUERY CompareTeams(string id1, string id2, int year) FOR GRAPH SoccerMatches { 
qset = {team.*};
t1 = SELECT tgt FROM qset:tgt WHERE tgt.id==id1;
t2 = SELECT tgt FROM qset:tgt WHERE tgt.id==id2;

//country 1 indirect wins against country 2
Result = SELECT m FROM t1:s -(had_match:hm) ->match:m WHERE m.winner==s.id AND m.year==year;
teams = SELECT te FROM Result:s -(had_match:hm) ->team:te WHERE te.id!=s.winner AND te.id!=id2;
FirResult = SELECT m FROM teams:s -(had_match:hm) ->match:m WHERE (m.winner==s.id AND m.year==year) AND (m.home_team==id2 OR m.away_team==id2);
PRINT FirResult.size();

If you follow along with the code, you’ll notice that it is the same as the code discussed above; however, we can pass in the two countries’ IDs in this code.

As you can see, if you run that query, we get Argentina has 1 indirect win against Brazil in the year 2000. Similarly, we can run the query to count the number of indirect wins Brazil has against Argentina. The result for that query is 0.

For the above example, Brazil has 0 indirect wins and Argentina has 1 indirect win. Because Argentina won more indirect matches against Brazil, we can claim that Argentina has a better chance of winning a match against Brazil that year. So we can predict Brazil.

We can put this together into one query:

CREATE QUERY CompareTeams(string id1, string id2, int year) FOR GRAPH SoccerMatches { 
qset = {team.*};
t1 = SELECT tgt FROM qset:tgt WHERE tgt.id==id1;
t2 = SELECT tgt FROM qset:tgt WHERE tgt.id==id2;

//country 1 indirect wins against country 2
Result = SELECT m FROM t1:s -(had_match:hm) ->match:m WHERE m.winner==s.id AND m.year==year;
teams = SELECT te FROM Result:s -(had_match:hm) ->team:te WHERE te.id!=s.winner AND te.id!=id2;
FirResult = SELECT m FROM teams:s -(had_match:hm) ->match:m WHERE (m.winner==s.id AND m.year==year) AND (m.home_team==id2 OR m.away_team==id2);

//country 2 indirect wins against country 1
Result = SELECT m FROM t2:s -(had_match:hm) ->match:m WHERE m.winner==s.id AND m.year==year;
teams = SELECT te FROM Result:s -(had_match:hm) ->team:te WHERE te.id!=s.winner AND te.id!=id1;
SecResult = SELECT m FROM teams:s -(had_match:hm) ->match:m WHERE (m.winner==s.id AND m.year==year) AND (m.home_team==id1 OR m.away_team==id1);

//Comparing who had more indirect wins
IF FirResult.size() > SecResult.size() THEN PRINT id1; END;
IF FirResult.size() < SecResult.size() THEN PRINT id2; END;
IF FirResult.size() == SecResult.size() THEN PRINT "Tie"; END;
}

And when we run it, we can see which country the algorithm predicts.

What’s next? Well obviously there is a lot of room for improvement. A few things I tried are:

  • Filtering by isNeutral and !isFriendly
  • Multiplying the number of indirect wins by the score_diff for the matches.
  • Going even deeper and using two teams as intermediates. (Team1 beats Team2 beats Team3 beats Team4 so Team1 can beat Team4)

Some ideas to try:

  • Who is the best team in a year?
  • What teams do better in friendly matches?

That’s all! I hope you learned a lot and good luck on future endeavors!

--

--