Parsing NBA Substitutions in Play-by-Play Data

Posted on Tue 29 December 2020 in Data Science

Parsing substitutions in basketball play-by-play data is a problem that has eluded me for a while. It's massively important when considering lineup-contextual events or statistics like plus/minus or for parsing rotation data. The below is the approach I came up with to parse this data out and get an idea of which lineups were on the court together and for how long. The best way I have figured to do it was using python classes to store lineup and player data and just change an on/off court value as they were subbed in and out. I am sure there are more elegant ways to handle this data, but I am not a computer scientist!

First we need the box score to get the player rosters for the game we want to parse. In this example I'm going to use Blazers/Nuggets Game 7 from 2019, not that I'm biased at all in choosing one of the best wins for my Blazers in my lifetime.

import requests
import json
import pandas as pd
import re
import numpy as np
import os
import datetime as dt

box_url = 'https://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=10&EndRange=28800&GameID=0041800237&RangeType=0&Season=2018-19&SeasonType=Playoffs&StartPeriod=1&StartRange=0'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64)', 'x-nba-stats-origin': 'stats', 'x-nba-stats-token': 'true', 'Host':'stats.nba.com', 'Referer':'https://stats.nba.com/game/0021900306/'}
r= requests.get(box_url, headers=headers, timeout = 5)
data = json.loads(r.text)
box = pd.DataFrame.from_dict(data['resultSets'][0]['rowSet'])
col_names = data['resultSets'][0]['headers']
box.columns = col_names
box.columns = box.columns.str.lower()
box
game_id team_id team_abbreviation team_city player_id player_name start_position comment min fgm fga fg_pct fg3m fg3a fg3_pct ftm fta ft_pct oreb dreb reb ast stl blk to pf pts plus_minus
0 0041800237 1610612757 POR Portland 203090 Maurice Harkless F 16:47 3.0 5.0 0.600 0.0 1.0 0.000 0.0 1.0 0.000 3.0 2.0 5.0 3.0 1.0 1.0 0.0 5.0 6.0 -8.0
1 0041800237 1610612757 POR Portland 202329 Al-Farouq Aminu F 7:08 1.0 4.0 0.250 0.0 2.0 0.000 1.0 2.0 0.500 0.0 3.0 3.0 0.0 0.0 0.0 1.0 1.0 3.0 -7.0
2 0041800237 1610612757 POR Portland 202683 Enes Kanter C 39:39 6.0 13.0 0.462 0.0 1.0 0.000 0.0 0.0 0.000 4.0 8.0 12.0 1.0 0.0 0.0 1.0 3.0 12.0 1.0
3 0041800237 1610612757 POR Portland 203468 CJ McCollum G 45:17 17.0 29.0 0.586 1.0 3.0 0.333 2.0 2.0 1.000 1.0 8.0 9.0 1.0 1.0 1.0 0.0 1.0 37.0 6.0
4 0041800237 1610612757 POR Portland 203081 Damian Lillard G 45:25 3.0 17.0 0.176 2.0 9.0 0.222 5.0 6.0 0.833 0.0 10.0 10.0 8.0 3.0 0.0 1.0 3.0 13.0 8.0
5 0041800237 1610612757 POR Portland 1628380 Zach Collins 23:17 2.0 6.0 0.333 1.0 3.0 0.333 2.0 2.0 1.000 2.0 4.0 6.0 1.0 0.0 4.0 1.0 5.0 7.0 5.0
6 0041800237 1610612757 POR Portland 203918 Rodney Hood 20:11 2.0 6.0 0.333 0.0 3.0 0.000 2.0 2.0 1.000 0.0 3.0 3.0 0.0 0.0 0.0 0.0 1.0 6.0 -2.0
7 0041800237 1610612757 POR Portland 203552 Seth Curry 16:20 0.0 2.0 0.000 0.0 2.0 0.000 0.0 0.0 0.000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0 0.0 7.0
8 0041800237 1610612757 POR Portland 202323 Evan Turner 19:12 3.0 7.0 0.429 0.0 0.0 0.000 8.0 9.0 0.889 2.0 5.0 7.0 2.0 0.0 1.0 0.0 4.0 14.0 1.0
9 0041800237 1610612757 POR Portland 203086 Meyers Leonard 6:44 1.0 4.0 0.250 0.0 2.0 0.000 0.0 0.0 0.000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 3.0 2.0 9.0
10 0041800237 1610612757 POR Portland 1627746 Skal Labissiere DNP - Coach's Decision None NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
11 0041800237 1610612757 POR Portland 1627774 Jake Layman DNP - Coach's Decision None NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
12 0041800237 1610612757 POR Portland 1629014 Anfernee Simons DNP - Coach's Decision None NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
13 0041800237 1610612743 DEN Denver 1628470 Torrey Craig F 33:01 2.0 5.0 0.400 0.0 2.0 0.000 4.0 5.0 0.800 4.0 4.0 8.0 2.0 0.0 0.0 1.0 2.0 8.0 6.0
14 0041800237 1610612743 DEN Denver 200794 Paul Millsap F 31:55 3.0 13.0 0.231 0.0 2.0 0.000 4.0 6.0 0.667 1.0 6.0 7.0 1.0 0.0 3.0 0.0 6.0 10.0 3.0
15 0041800237 1610612743 DEN Denver 203999 Nikola Jokic C 41:53 11.0 26.0 0.423 2.0 6.0 0.333 5.0 7.0 0.714 4.0 9.0 13.0 2.0 0.0 4.0 2.0 3.0 29.0 -1.0
16 0041800237 1610612743 DEN Denver 203914 Gary Harris G 39:10 7.0 11.0 0.636 0.0 1.0 0.000 1.0 2.0 0.500 0.0 6.0 6.0 3.0 0.0 0.0 1.0 3.0 15.0 -7.0
17 0041800237 1610612743 DEN Denver 1627750 Jamal Murray G 37:53 4.0 18.0 0.222 0.0 4.0 0.000 9.0 9.0 1.000 2.0 4.0 6.0 5.0 0.0 0.0 1.0 1.0 17.0 -2.0
18 0041800237 1610612743 DEN Denver 203486 Mason Plumlee 18:48 1.0 3.0 0.333 0.0 0.0 0.000 2.0 5.0 0.400 1.0 5.0 6.0 0.0 0.0 2.0 0.0 3.0 4.0 -7.0
19 0041800237 1610612743 DEN Denver 203115 Will Barton 19:58 4.0 9.0 0.444 0.0 2.0 0.000 0.0 0.0 0.000 1.0 2.0 3.0 1.0 0.0 0.0 0.0 3.0 8.0 -9.0
20 0041800237 1610612743 DEN Denver 1627736 Malik Beasley 7:15 0.0 1.0 0.000 0.0 1.0 0.000 0.0 0.0 0.000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 -1.0
21 0041800237 1610612743 DEN Denver 1628420 Monte Morris 10:07 1.0 3.0 0.333 0.0 1.0 0.000 3.0 5.0 0.600 0.0 2.0 2.0 1.0 1.0 0.0 0.0 1.0 5.0 -2.0
22 0041800237 1610612743 DEN Denver 1627823 Juancho Hernangomez DNP - Coach's Decision None NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
23 0041800237 1610612743 DEN Denver 1626168 Trey Lyles DNP - Coach's Decision None NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
24 0041800237 1610612743 DEN Denver 202738 Isaiah Thomas DNP - Coach's Decision None NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
25 0041800237 1610612743 DEN Denver 1629020 Jarred Vanderbilt DNP - Coach's Decision None NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

Filtering to get the starters

starters = box[box['start_position']!= '']
starters = starters[['team_id','team_abbreviation','player_id','player_name','start_position']]
starters
team_id team_abbreviation player_id player_name start_position
0 1610612757 POR 203090 Maurice Harkless F
1 1610612757 POR 202329 Al-Farouq Aminu F
2 1610612757 POR 202683 Enes Kanter C
3 1610612757 POR 203468 CJ McCollum G
4 1610612757 POR 203081 Damian Lillard G
13 1610612743 DEN 1628470 Torrey Craig F
14 1610612743 DEN 200794 Paul Millsap F
15 1610612743 DEN 203999 Nikola Jokic C
16 1610612743 DEN 203914 Gary Harris G
17 1610612743 DEN 1627750 Jamal Murray G

Now pulling play by play data and some helper stuff to convert time strings to integers:

pbp_url = 'https://stats.nba.com/stats/playbyplayv2?EndPeriod=10&EndRange=55800&GameID=0041800237&RangeType=2&Season=2018-19&SeasonType=Playoffs&StartPeriod=1&StartRange=0'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64)', 'x-nba-stats-origin': 'stats', 'x-nba-stats-token': 'true', 'Host':'stats.nba.com', 'Referer':'https://stats.nba.com/game/0021900306/'}
r= requests.get(pbp_url, headers=headers, timeout = 5)
data = json.loads(r.text)
pbp = pd.DataFrame.from_dict(data['resultSets'][0]['rowSet'])
col_names = data['resultSets'][0]['headers']
pbp.columns = col_names
pbp.columns = pbp.columns.str.lower()
pbp_times = pbp['pctimestring'].str.split(':',2, expand=True)
pbp_times[0] = pbp_times[0].astype(str).astype(int)
pbp_times[1] = pbp_times[1].astype(str).astype(int)
pbp['timeinseconds'] = (pbp_times[0]*60) + pbp_times[1]
pbp['play_elapsed_time'] = pbp['timeinseconds'].shift(1)  - pbp['timeinseconds'] 
pbp['play_elapsed_time'] = pbp['play_elapsed_time'].fillna(0)
pbp['play_elapsed_time'] = np.where(pbp['period'] != pbp['period'].shift(1), 0, pbp['play_elapsed_time'])
pbp['total_elapsed_time'] = pbp.groupby(['game_id'])['play_elapsed_time'].cumsum()
pbp['max_time'] = pbp.groupby('game_id')['play_elapsed_time'].transform('sum')
pbp['time_remaining'] = pbp['max_time'] - pbp['total_elapsed_time']
pbp['scoremargin'] = np.where(pbp['scoremargin']=='TIE',0,pbp['scoremargin'])
pbp['scoremargin'] = pbp['scoremargin'].fillna(0).astype(int)
pbp.head()
game_id eventnum eventmsgtype eventmsgactiontype period wctimestring pctimestring homedescription neutraldescription visitordescription score scoremargin person1type player1_id player1_name player1_team_id player1_team_city player1_team_nickname player1_team_abbreviation person2type player2_id player2_name player2_team_id player2_team_city player2_team_nickname player2_team_abbreviation person3type player3_id player3_name player3_team_id player3_team_city player3_team_nickname player3_team_abbreviation video_available_flag timeinseconds play_elapsed_time total_elapsed_time max_time time_remaining
0 0041800237 2 12 0 1 3:41 PM 12:00 None None None None 0 0 0 None NaN None None None 0 0 None NaN None None None 0 0 None NaN None None None 0 720 0.0 0.0 2880.0 2880.0
1 0041800237 4 10 0 1 3:41 PM 12:00 Jump Ball Millsap vs. Kanter: Tip to Harkless None None None 0 4 200794 Paul Millsap 1.610613e+09 Denver Nuggets DEN 5 202683 Enes Kanter 1.610613e+09 Portland Trail Blazers POR 5 203090 Maurice Harkless 1.610613e+09 Portland Trail Blazers POR 1 720 0.0 0.0 2880.0 2880.0
2 0041800237 7 6 26 1 3:41 PM 11:45 None None Aminu Offensive Charge Foul (P1.T1) (J.Goble) None 0 5 202329 Al-Farouq Aminu 1.610613e+09 Portland Trail Blazers POR 4 200794 Paul Millsap 1.610613e+09 Denver Nuggets DEN 1 0 None NaN None None None 1 705 15.0 15.0 2880.0 2865.0
3 0041800237 9 5 37 1 3:41 PM 11:45 None None Aminu Offensive Foul Turnover (P1.T1) None 0 5 202329 Al-Farouq Aminu 1.610613e+09 Portland Trail Blazers POR 0 0 None NaN None None None 1 0 None NaN None None None 1 705 0.0 15.0 2880.0 2865.0
4 0041800237 10 1 6 1 3:42 PM 11:28 Harris 2' Driving Layup (2 PTS) None None 0 - 2 2 4 203914 Gary Harris 1.610613e+09 Denver Nuggets DEN 0 0 None NaN None None None 0 0 None NaN None None None 1 688 17.0 32.0 2880.0 2848.0

Now to set up the classes in order to keep track of who is on and off the court. I'm creating a class object for each player, team and lineup (called LineupStats) and then a Game class that parses through the play by play. Creating the team class also runs helper functions to pull the rosters and starters from the box score we pulled earlier:

class Player():
    def __init__(self, playerid, teamid, name):
        self.playerid = playerid
        self.teamid = teamid
        self.name = name
        self.oncourt = 0
        self.court_time = 0

    def to_dict(self):
        return {
            'court_time' : self.court_time,
            'playerid' : self.playerid, 
            'teamid' : self.teamid
        }

class Team():
    def __init__(self, teamid, gameid):
        self.court_time = 0
        self.roster = []
        self.lineup = []
        self.teamid = teamid
        self.gameid = gameid
        self.starters = []
        self.lineups = []

    def getRoster(self, box):
        for index,row in box.iterrows():
            if row['team_id'] == self.teamid:
                x = Player(playerid = row['player_id'],teamid = row['team_id'], name = row['player_name'])
                self.roster.append(x)

    def getStarters(self, box):
        for index,row in box.iterrows():
            if row['team_id'] == self.teamid and row['start_position'] != '':
                for p in self.roster:
                    if p.playerid == row['player_id']:    
                        self.lineup.append(p)
                        self.starters.append(p)
                        p.oncourt = 1  

    def initLineup(self):
        if self.lineup:
            self.lu = LineupStats(self.lineup, self.gameid, self.teamid)

    def Sub(self, sub_in, sub_out, event, time):
        self.resetLineup(event, time)
        for x in self.lineup:
            if x.playerid == sub_out:
                x.oncourt = 0
                self.lineup.remove(x)
        for x in self.roster:
            if x.playerid == sub_in:
                x.oncourt = 1
                self.lineup.append(x)  

    def quarterSubs(self, lineup):
        for x in self.lineup[:]:
            if x.playerid not in lineup:
                self.lineup.remove(x)
                x.oncourt = 0
        for l in lineup:
            for x in self.roster:
                if x.playerid == l and x.oncourt == 0:
                    self.lineup.append(x)
                    x.oncourt = 1

    def resetLineup(self, event, time):
        self.lineups.append(self.lu.to_dict(time))
        self.lu.pts = 0        
        self.lu.drbd = 0
        self.lu.orbd = 0
        self.lu.stl = 0
        self.lu.blk = 0
        self.lu.ast = 0
        self.lu.fgm = 0
        self.lu.fga = 0
        self.lu.ftm = 0
        self.lu.fta = 0
        self.lu.pf = 0
        self.lu.tov = 0   
        self.lu.lu_time = 0
        self.lu.diff = 0
        self.lu.fg3a = 0
        self.lu.fg3m = 0  
        self.lu.poss = 0
        self.lu.event_start = event
        self.lu.time_on = time

    def to_dict(self):
        return {
            'teamid' : self.teamid,
            'gameid' : self.gameid,
            'court_time' : self.court_time,
            'starters' : [int(x.playerid) for x in self.starters]
        }


class LineupStats():
    def __init__(self, lineup, gameid,teamid):
        self.lineup = lineup
        self.pts = 0
        self.drbd = 0
        self.orbd = 0
        self.stl = 0
        self.blk = 0
        self.ast = 0
        self.fgm = 0
        self.fga = 0
        self.ftm = 0
        self.fta = 0
        self.pf = 0
        self.tov = 0   
        self.lu_time = 0
        self.diff = 0
        self.fg3a = 0
        self.fg3m = 0
        self.event_start = 0
        self.time_on = 0
        self.time_off = 0
        self.event_end = 0
        self.gameid = gameid
        self.teamid = teamid
        self.poss = 0

    def to_dict(self, time_end):
        return {
            'lineup' : [int(x.playerid) for x in self.lineup],
            'pts' : self.pts,
            'drbd' : self.drbd,
            'stl' : self.stl,
            'blk' : self.blk,
            'ast' : self.ast,
            'fgm' : self.fgm,
            'fga' : self.fga,
            'ftm' : self.ftm,
            'fta' : self.fta,
            'orbd' : self.orbd,
            'pf' : self.pf,
            'tov' : self.tov,
            'fg3a' : self.fg3a,
            'fg3m' : self.fg3m,
            'lu_time' : self.lu_time,
            'diff' : self.diff ,
            'event_start' : self.event_start,
            'time_on' : self.time_on,
            'time_off' : time_end,
            'gameid' : self.gameid,
            'teamid' : self.teamid,
            'poss' : self.poss
        }      


class Game():
    def __init__(self, hteam, ateam,  gameid, pbp, box):
        self.hteam = hteam
        self.ateam = ateam
        self.time_elapsed = 0
        self.event = 1
        self.pbp = pbp
        self.box = box
        self.poss = 0

    def initRosters(self):
        self.hteam.getRoster(self.box)
        self.ateam.getRoster(self.box)

    def initStarters(self):
        self.hteam.getStarters(self.box)
        self.ateam.getStarters(self.box)

    def addCourtTime(self, time):
        for x in self.hteam.lineup:
            x.court_time += time
        for x in self.ateam.lineup:
            x.court_time += time

    def getQuarterStarters(self, quarter):
        if quarter == 2:
            start_range = 7201
            end_range = 7493
        elif quarter == 3:
            start_range = 14410
            end_range = 14640
        elif quarter == 4:
            start_range = 21621
            end_range = 21913          
        starters_url = 'https://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=14&GameID=0041800237&RangeType=2&Season=2018-19&SeasonType=Playoffs&StartPeriod=1&StartRange=' + str(start_range) + '&EndRange=' + str(end_range)
        headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64)', 'x-nba-stats-origin': 'stats', 'x-nba-stats-token': 'true', 'Host':'stats.nba.com', 'Referer':'https://stats.nba.com/game/0021900306/'}
        r= requests.get(starters_url, headers=headers, timeout = 5)
        data = json.loads(r.text)
        starters = pd.DataFrame.from_dict(data['resultSets'][0]['rowSet'])
        col_names = data['resultSets'][0]['headers']
        starters.columns = col_names
        starters.columns = starters.columns.str.lower()
        hteam_starters = starters[starters['team_id']==self.hteam.teamid]
        ateam_starters = starters[starters['team_id']==self.ateam.teamid]
        hteam_starters = list(hteam_starters['player_id'])
        ateam_starters = list(ateam_starters['player_id'])
        self.hteam.quarterSubs(hteam_starters)
        self.ateam.quarterSubs(ateam_starters)

    def parseGame(self):
        self.initRosters()
        self.initStarters()
        self.hteam.initLineup()
        self.ateam.initLineup()
        for index, row in self.pbp.iterrows():
            assert len(self.hteam.lineup)==5, 'home lineup not equal to 5'
            assert len(self.ateam.lineup)==5, 'away lineup not equal to 5'
            if row['pctimestring'] == '12:00' and row['period'] != 1 and row['period'] != prev_row_period:
                self.getQuarterStarters(int(row['period']))
            self.addCourtTime(row['play_elapsed_time'])
            if row['eventmsgtype'] == 1:
                if row['player1_team_id'] == self.hteam.teamid:
                    self.hteam.lu.diff += row['scoremargin']
                    self.ateam.lu.diff -= row['scoremargin']
                    self.hteam.lu.pts += row['scoremargin']
                else:
                    self.ateam.lu.diff += row['scoremargin']
                    self.hteam.lu.diff -= row['scoremargin']
                    self.ateam.lu.pts += row['scoremargin']      
            if row['eventmsgtype'] == 8:
                if row['player1_team_id'] == game.hteam.teamid:
                    self.hteam.Sub(sub_in=row['player2_id'], sub_out=row['player1_id'], event=row['eventnum'], time=row['time_remaining'])
                else:
                    self.ateam.Sub(sub_in=row['player2_id'], sub_out=row['player1_id'], event=row['eventnum'], time=row['time_remaining'])   
            prev_row_period = row['period']
por = Team(gameid='0041800237', teamid=1610612757)
por.getRoster(box)

for x in por.roster:
    print(x.playerid, x.name)
203090 Maurice Harkless
202329 Al-Farouq Aminu
202683 Enes Kanter
203468 CJ McCollum
203081 Damian Lillard
1628380 Zach Collins
203918 Rodney Hood
203552 Seth Curry
202323 Evan Turner
203086 Meyers Leonard
1627746 Skal Labissiere
1627774 Jake Layman
1629014 Anfernee Simons
por.getStarters(box)

for x in por.starters:
    print(x.playerid, x.name)

for x in por.lineup:
    print(x.playerid, x.name)
203090 Maurice Harkless
202329 Al-Farouq Aminu
202683 Enes Kanter
203468 CJ McCollum
203081 Damian Lillard
203090 Maurice Harkless
202329 Al-Farouq Aminu
202683 Enes Kanter
203468 CJ McCollum
203081 Damian Lillard

Notice above the starters for Portland is the same as the lineup for Portland because we've only pulled in the box score rosters and the starters from the box score. Trivial, but important to note where I'm getting that data before getting into the play by play. Getting into the real meat and potatoes of how I'm parsing substitutions, here's the function that runs everything within the Game Class:

def parseGame(self):
    self.initRosters()
    self.initStarters()
    self.hteam.initLineup()
    self.ateam.initLineup()
    for index, row in self.pbp.iterrows():
        assert len(self.hteam.lineup)==5, 'home lineup not equal to 5'
        assert len(self.ateam.lineup)==5, 'away lineup not equal to 5'
        if row['pctimestring'] == '12:00' and row['period'] != 1 and row['period'] != prev_row_period:
            self.getQuarterStarters(int(row['period']))
        self.addCourtTime(row['play_elapsed_time'])
        if row['eventmsgtype'] == 1:
            if row['player1_team_id'] == self.hteam.teamid:
                self.hteam.lu.diff += row['scoremargin']
                self.ateam.lu.diff -= row['scoremargin']
                self.hteam.lu.pts += row['scoremargin']
            else:
                self.ateam.lu.diff += row['scoremargin']
                self.hteam.lu.diff -= row['scoremargin']
                self.ateam.lu.pts += row['scoremargin']      
        if row['eventmsgtype'] == 8:
            if row['player1_team_id'] == game.hteam.teamid:
                self.hteam.Sub(sub_in=row['player2_id'], sub_out=row['player1_id'], event=row['eventnum'], time=row['time_remaining'])
            else:
                self.ateam.Sub(sub_in=row['player2_id'], sub_out=row['player1_id'], event=row['eventnum'], time=row['time_remaining'])   
        prev_row_period = row['period']

The first five rows I'm just initializing the game. Then I start to loop through each row of the play by play.

if row['pctimestring'] == '12:00' and row['period'] != 1 and row['period'] != prev_row_period:
    game.getQuarterStarters(int(row['period']))

NBA's PBP has a separate game event for the end and start of each period, so if the time at the current row is equal to '12:00' then I use the getQuarterStarters helper function in order to get the starters of each quarter from the NBA's box score query feature.

def getQuarterStarters(self, quarter):
    if quarter == 2:
        start_range = 7201
        end_range = 7493
    elif quarter == 3:
        start_range = 14410
        end_range = 14640
    elif quarter == 4:
        start_range = 21621
        end_range = 21913          
    starters_url = 'https://stats.nba.com/stats/boxscoretraditionalv2?EndPeriod=14&GameID=0041800237&RangeType=2&Season=2018-19&SeasonType=Playoffs&StartPeriod=1&StartRange=' + str(start_range) + '&EndRange=' + str(end_range)
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64)', 'x-nba-stats-origin': 'stats', 'x-nba-stats-token': 'true', 'Host':'stats.nba.com', 'Referer':'https://stats.nba.com/game/0021900306/'}
    r= requests.get(starters_url, headers=headers, timeout = 5)
    data = json.loads(r.text)
    starters = pd.DataFrame.from_dict(data['resultSets'][0]['rowSet'])
    col_names = data['resultSets'][0]['headers']
    starters.columns = col_names
    starters.columns = starters.columns.str.lower()
    hteam_starters = starters[starters['team_id']==self.hteam.teamid]
    ateam_starters = starters[starters['team_id']==self.ateam.teamid]
    hteam_starters = list(hteam_starters['player_id'])
    ateam_starters = list(ateam_starters['player_id'])
    self.hteam.quarterSubs(hteam_starters)
    self.ateam.quarterSubs(ateam_starters)

If the starters of the next quarter are different than the lineup that ended the quarter, I run the quarterSubs function from the Team class to replace the correct players:

def quarterSubs(self, lineup):
    for x in self.lineup[:]:
        if x.playerid not in lineup:
            self.lineup.remove(x)
            x.oncourt = 0
    for l in lineup:
        for x in self.roster:
            if x.playerid == l and x.oncourt == 0:
                self.lineup.append(x)
                x.oncourt = 1

Since no time elapsed between the end of quarters, I don't need to change any lineup or player statistics.

    game.addCourtTime(row['play_elapsed_time'])

Just adding playing time for each player from the previous game event to the current.

The play-by-play from the NBA's API has an 'eventmsgtype' column that has a different key for each event on the court. For our purposes, 1 = made basket and 8 = substitution. Now we can check and see if there was a change in the score so we can update the scoring margin for each lineup:

    if row['eventmsgtype'] == 1:
        if row['player1_team_id'] == game.hteam.teamid:
            game.hteam.lu.diff += row['scoremargin']
            game.ateam.lu.diff -= row['scoremargin']
            game.hteam.lu.pts += row['scoremargin']
        else:
            game.ateam.lu.diff += row['scoremargin']
            game.hteam.lu.diff -= row['scoremargin']
            game.ateam.lu.pts += row['scoremargin']

Finally, parsing the actual substitutions.

    if row['eventmsgtype'] == 8:
        if row['player1_team_id'] == game.hteam.teamid:
            game.hteam.Sub(sub_in=row['player2_id'], sub_out=row['player1_id'], event=row['eventnum'], time=row['time_remaining'])
        else:
            game.ateam.Sub(sub_in=row['player2_id'], sub_out=row['player1_id'], event=row['eventnum'], time=row['time_remaining'])

def Sub(self, sub_in, sub_out, event, time):
    self.resetLineup(event, time)
    for x in self.lineup:
        if x.playerid == sub_out:
            print('sub found')
            x.oncourt = 0
            self.lineup.remove(x)
    for x in self.roster:
        if x.playerid == sub_in:
            x.oncourt = 1
            self.lineup.append(x)

Each substitution in the Game class calls the Sub function from our team class. In the PBP data, we see that for each substitution we have columns for the players involved (player_1_player_id, player_2_player_id, etc.). After resetting the team's lineup because this is the end of that specific lineup's time on the court, we then search through the list of player ids inside of our lineup to find the player getting subbed out. We use the remove function in python in order to remove that player from the list, and then we append the new player's id into our lineup object.

Here's how this looks in python if we sub in Zach Collins for Enes Kanter:

por = Team(gameid='0041800237', teamid=1610612757)
den = Team(gameid='0041800237', teamid=1610612743)

game = Game(den, por, '0041800237',pbp,box)
game.initRosters()
game.initStarters()
game.hteam.initLineup()
game.ateam.initLineup()
for x in game.ateam.lineup:
    print(x.playerid, x.name)
203090 Maurice Harkless
202329 Al-Farouq Aminu
202683 Enes Kanter
203468 CJ McCollum
203081 Damian Lillard
game.ateam.Sub(sub_in=1628380, sub_out=202683, event=1, time=200)

for x in game.ateam.lineup:
    print(x.playerid, x.name)
203090 Maurice Harkless
202329 Al-Farouq Aminu
203468 CJ McCollum
203081 Damian Lillard
1628380 Zach Collins
por = Team(gameid='0041800237', teamid=1610612757)
den = Team(gameid='0041800237', teamid=1610612743)
game = Game(den, por, '0041800237',pbp,box)
game.parseGame()
for x in game.ateam.roster:
    print(x.name, x.court_time)
Maurice Harkless 1008.0
Al-Farouq Aminu 428.0
Enes Kanter 2379.0
CJ McCollum 2717.0
Damian Lillard 2725.0
Zach Collins 1397.0
Rodney Hood 1211.0
Seth Curry 979.0
Evan Turner 1152.0
Meyers Leonard 404.0
Skal Labissiere 0
Jake Layman 0
Anfernee Simons 0

A quick check of the box score shows that Damian Lillard played 45 minutes and 25 seconds. Our calculated court time within the player class shows that he played -- 2725 seconds or 45 minutes and 25 seconds!

Hopefully you find this methodology useful. I've tinkered with this problem on-and-off for a while now and most of the solutions I tried (e.g. doing this in Pandas) just weren't very robust and had a lot of problems.