Task Overview
Input: Panoramic 4K video (full match)
Output: Per-frame game state on 2D pitch minimap (position + role + team + jersey number)
Alignment with SoccerNet-GSR:
β Same: 2D pitch coordinates, player roles, team assignment, jersey numbers
β οΈ Different: Fixed panoramic camera + full match (not broadcast clips)
β Same: 2D pitch coordinates, player roles, team assignment, jersey numbers
β οΈ Different: Fixed panoramic camera + full match (not broadcast clips)
Input / Output Definition
Input
videos/match_XXXXX.mp4β Full match panoramic video (4K)- Camera: BePro Cerberus (2 matches) or 3-camera panoramic (8 matches)
Output (per frame)
| Field | Type | Description |
|---|---|---|
frame |
int | Frame number |
time |
float | Timestamp in video (seconds) |
player_id |
int | Unique track ID (persistent throughout match) |
x, y |
float | 2D pitch coordinates in meters |
role |
string | player / goalkeeper / referee / other |
team_side |
string | left / right / null |
jersey_number |
int/null | 0β99 or null |
File Layout
production/
βββ videos/
β βββ 117093.mp4
β βββ 117094.mp4
β βββ ...
βββ gsr/
β βββ 117093/
β β βββ 117093.json
β βββ 117094/
β β βββ 117094.json
β βββ ...
βββ bas/
βββ mot/
βββ raw/
βββ 117093/
β βββ 117093_keypoints.json
β βββ 117093_mapx.npy
β βββ 117093_mapy.npy
β βββ ...
βββ ...
Each match has one annotation file in its own directory.
Unlike SoccerNet-GSR which uses img1/ for image frames, we reference the video directly
from videos/ to save storage space.
Schema & SoccerNet Mapping
Example JSON Structure
{
"match_id": "117093",
"fps": 25.0,
"annotations": [
{
"frame": 0,
"time": 0.0,
"objects": [
{
"player_id": 1,
"x": 52.5,
"y": 34.0,
"role": "player",
"team_side": "left",
"jersey_number": 10
},
...
]
},
...
]
}
SoccerNet-GSR Correspondence
| SoccerTrack v2 | SoccerNet-GSR | Notes |
|---|---|---|
x, y |
x_pos, y_pos |
Pitch coordinates (meters) |
team_side |
left / right |
Team assignment |
jersey_number |
jersey_number |
0β99 |
role |
role |
player/goalkeeper/referee/other |
player_id |
track_id |
Persistent across full match |
Evaluation / Metrics
Evaluation uses GS-HOTA (Game State Higher Order Tracking Accuracy), following the SoccerNet-GSR Challenge definition.
Official devkit: Coming soon
Differences vs SoccerNet-GSR
Key Differences:
- Camera view: Panoramic (full-pitch) vs. broadcast (partial view with occlusions)
- Duration: Full match (~90 min) vs. short clips
- Camera setup: Fixed BePro/3-camera system vs. moving broadcast camera
- Level: University amateur matches vs. professional leagues