Podcast
Questions and Answers
What is the purpose of loading the MLBAM data in the analysis?
What is the purpose of loading the MLBAM data in the analysis?
Which programming language is primarily used for the analysis discussed?
Which programming language is primarily used for the analysis discussed?
How many rows are available in the MLBAM 2018 data set?
How many rows are available in the MLBAM 2018 data set?
What is the first step when starting the analysis in the Jupyter notebook?
What is the first step when starting the analysis in the Jupyter notebook?
Signup and view all the answers
Which column is immediately dropped from the MLBAM data and why?
Which column is immediately dropped from the MLBAM data and why?
Signup and view all the answers
What data source does the book 'Analyzing Baseball Data with R' primarily use?
What data source does the book 'Analyzing Baseball Data with R' primarily use?
Signup and view all the answers
What is the purpose of setting the option to display a maximum of 100 columns?
What is the purpose of setting the option to display a maximum of 100 columns?
Signup and view all the answers
How many columns are included in the MLBAM 2018 data set?
How many columns are included in the MLBAM 2018 data set?
Signup and view all the answers
What does the variable start1B represent in the run expectancy analysis?
What does the variable start1B represent in the run expectancy analysis?
Signup and view all the answers
What is represented by the term NaN in the data set?
What is represented by the term NaN in the data set?
Signup and view all the answers
How does the startOuts variable function within the run expectancy framework?
How does the startOuts variable function within the run expectancy framework?
Signup and view all the answers
In the context of this analysis, what does runsFuture represent?
In the context of this analysis, what does runsFuture represent?
Signup and view all the answers
Which variable would indicate if second base is occupied before the plate appearance?
Which variable would indicate if second base is occupied before the plate appearance?
Signup and view all the answers
What coding value is assigned to start1 if start1B is null?
What coding value is assigned to start1 if start1B is null?
Signup and view all the answers
What does the 'stand' variable represent in the dataset?
What does the 'stand' variable represent in the dataset?
Signup and view all the answers
Which of the following variables indicates whether first base is occupied after the plate appearance?
Which of the following variables indicates whether first base is occupied after the plate appearance?
Signup and view all the answers
What does 'start1' represent in this context?
What does 'start1' represent in this context?
Signup and view all the answers
How is the 'start_state' variable created?
How is the 'start_state' variable created?
Signup and view all the answers
What is the purpose of converting the variables to strings using 'astype string'?
What is the purpose of converting the variables to strings using 'astype string'?
Signup and view all the answers
What does 'end1' signify after the plate appearance?
What does 'end1' signify after the plate appearance?
Signup and view all the answers
What does the value of '0' represent for the variable 'end1'?
What does the value of '0' represent for the variable 'end1'?
Signup and view all the answers
What is included in the 'end_state' variable?
What is included in the 'end_state' variable?
Signup and view all the answers
Which statement is true regarding the occupancy variables 'start1', 'start2', and 'start3'?
Which statement is true regarding the occupancy variables 'start1', 'start2', and 'start3'?
Signup and view all the answers
What is the significance of the space added in the concatenation of the state variables?
What is the significance of the space added in the concatenation of the state variables?
Signup and view all the answers
What event is excluded from the run expectancy analysis due to its unrelatedness to batting performance?
What event is excluded from the run expectancy analysis due to its unrelatedness to batting performance?
Signup and view all the answers
Why is it important to ensure that outsInInning equals 3 in the analysis?
Why is it important to ensure that outsInInning equals 3 in the analysis?
Signup and view all the answers
What condition is applied to subset the data in the analysis?
What condition is applied to subset the data in the analysis?
Signup and view all the answers
After preparing the data, how many rows are retained for further analysis?
After preparing the data, how many rows are retained for further analysis?
Signup and view all the answers
What is the main goal of the changes made to the data set?
What is the main goal of the changes made to the data set?
Signup and view all the answers
What type of event might occur in a game that would require exclusion from expectancy analysis?
What type of event might occur in a game that would require exclusion from expectancy analysis?
Signup and view all the answers
What does the run expectancy analysis aim to evaluate in baseball?
What does the run expectancy analysis aim to evaluate in baseball?
Signup and view all the answers
What is the effect of excluding observations where start_state equals end_state?
What is the effect of excluding observations where start_state equals end_state?
Signup and view all the answers
Study Notes
Run Expectancy Analysis
- Run expectancy (RE) variables are created for analysis
- Analyzing Baseball Data with R, by Marchi and Albert (2013) is a useful resource for further reading
- Jupyter Notebook will use MLBAM (Major League Baseball Advanced Media) data
- General process of analyzing baseball data is similar to the book's, using R or Python
- Packages pandas (pd) and numpy (np) are loaded
- MLBAM 2018 data is read into a variable (MLBAM18)
- Unnecessary 'unnamed: 0' column is dropped
- Maximum display columns set to 100, due to many observations
- Data contains event-by-event data for 2018 baseball season
- Data has 185,771 rows and 62 columns
Run Expectancy Variables
- Variables to keep for analysis are identified (batterName, batterID, eventType, etc.)
- A new variable 'RE18' is created from MLBAM18 data, selecting specified columns
- Variables 'start1B', 'start2B', 'start3B' represent base occupation before plate appearance
Base State Variables
- These variables track base occupation prior to each plate appearance
- Null values ('NaN') indicate a base is not occupied
- Variables 'start1', 'start2', and 'start3' represent whether first, second, and third base are occupied respectively.
Plate Appearance State
- 'start_state' variable is created concatenating 'start1', 'start2', 'start3', and the number of outs prior to the plate appearance
- The new variable gives a comprehensive state of the game before a given play appearance.
Plate Appearance State (End of Play)
- End-of-play state variables ('end1', 'end2', 'end3') are created
- A new variable 'end_state' is produced in a similar way to start-state, concatenating end variables and outs.
Data Filtering
- Data is filtered, removing:
- Events where start_state is not equal to end_state
- Events with runs on play is equal to zero
- Unnecessary events like dropped foul balls
- Data rows where outsInInning isn't equal to 3
- Resulting data set has 184,949 rows and 27 columns
Studying That Suits You
Use AI to generate personalized quizzes and flashcards to suit your learning preferences.
Related Documents
Description
Explore run expectancy analysis in baseball using R and Python. This quiz focuses on the analysis of MLBAM data and the creation of run expectancy variables. Familiarity with data manipulation in pandas and numpy is beneficial for this exercise.