Re-posted from Canis Hoopus (original here: http://www.canishoopus.com/2012/6/24/3114820/projecting-the-draft):
Over the last few days I finally got around to investigating the NCAA to NBA transition. With the draft coming everyone's favorite question is, "what kind of college players become quality NBA players." I still can't answer that question with any confidence, but I do have some interesting findings that may help get the ball rolling.
My analysis used all college players in the draft express database beginning in 2002/03, and the season in which all players turned 25 years old in the past eight years. I used this data to identify how pre-draft information (production and combine measurements) can help us understand where a player will be when he reaches his peak years of production.
I did two different studies with the data: 1) I built predictive models for each box score statistic (pnts, rebs, stls...) independently. 2) I built predictive models for the likelihood that a college player would be "out of the league"/"an NBA starter"/"an NBA star" at age 25. The results expand on past findings, but also turn up some surprises.
Study 1: "Projecting the Box Score"
This analysis used basic college production: Points, minutes, shot attempts, shot accuracy, free throw attempts and accuracy, offensive and defensive rebounding, assists, steals, turnovers, blocks, and personal fouls, combine data: Height, weight, wingspan, maximum vertical, bench, agility drill, and sprint time, and some controls: Age, position, and strength of schedule (OppO and OppD via Kenpom)
I started with all of these variables and simply plugged them into a stepwise AIC program to try and identify which collection of variables offers the "best" prediction of statistic specific production at age 25. I used this same process to predict : Points, eFG%, assists, turnovers, offensive and Defensive rebounds, blocks, and steals.
Here is what I found.
NBA points per 40 is not easy to predict. College scoring only explains 16% of the variability in NBA scoring. While still leaving a lot to randomness, we can double this predictive power with a few additions. Including free-throw accuracy, weight, and maximum vertical jump while controlling for age, position and shot attempts helps explain 32% of the variability in NBA scoring. In sum, good NBA scorers scored a lot in college without taking too many shots, could hit free-throws in college, and were heavy and springy for their position.
The relationship between collegiate shooting efficiency and NBA shooting efficiency is very weak. I keep saying it, but I don't think I have fully internalized it yet. College shooting efficiency says nothing about NBA shooting efficiency. Look at the plot:
Do not expect to get any information out of a college player's shooting efficiency (I'm looking at you Derrick Williams' 70 TS%)
While shooting efficiency itself is a terrible predictor, we can get some information if we include additional variables. Players who scored a lot and had a reliable three-point shot have an efficiency advantage in the NBA. In addition, offensive rebounds, assists, and avoidance of personal fouls in college all speaks to more efficient shooting in the pros. The physical profile of a player who outperforms his college shooting efficiency in the pros is tall, heavy, fast, and springy, but with short arms. Think of Blake Griffin... or maybe an Allosaurus as the prototype for efficient NBA scoring. Putting all of these factors together and controlling for age, we still can only explain 20% of the variability in NBA shooting efficiency.
Now we get to the skills where NBA production is reasonably predictable. College assists alone explain nearly half of the variation in pro passing, but we can do even better than that. Interestingly after controlling for age and position the additional features that help explain NBA assists are "offensive rebounds" and "weight." Guys who are heavy and good at collecting offensive rebounds for their position tend to outperform their collegiate assist numbers in the NBA.
I can't help but think of this guy:
After adding in this additional variables, we can explain nearly 70% of pro assist rates.
Collegiate turn-overs themselves are a terrible predictor of NBA TO rates (r^2 of 0.07), so stop looking at those. However, some other statistics do step in to help explain NBA ball security. Controlling for position, players who can and do shoot threes improve on their collegiate TO rates in the NBA. Interestingly, offensive rebounding pops up again. Not only do collegiate offensive rebounders collect relatively more assists and shoot more efficiently in the NBA, but they also commit fewer turnovers. After including three point shooting, position, and offensive rebounding along with collegiate TO rate, we can explain just over 25% of the variability in NBA turn-over rates.
Collegiate offensive rebounding explains about 40% of the variation in NBA offensive rebounding. This is a pretty strong predictor; however, the best model of NBA offensive rebounding doesn't even include NCAA rebounding rate! After controlling for position, all you need to explain 62% of the variation in NBA offensive rebounding is collegiate free-throw rate and the number of repetitions on the bench press at the combine. Muscular players who get to the free-throw line in college are good NBA offensive rebounders. After including those two variables, college offensive rebounding rate has almost no impact on the model (p-value of 0.88.) Whatever information is carried in collegiate offensive rebounding is already accounted for by FTA/FGA and bench press. My money is on aggression and strength.
Offensive rebounding is definitely the strangest college statistic. We have seen offensive rebounding pop out as one of the better predictors for success in three other offensive skills, but in the end, collegiate ORB rates don't even help explain NBA ORB rates.
Collegiate defensive rebounding explains about 30% of the variability in pro rebounding. This is Ok, but we can more than double it with additional variables. Wingspan, bench reps, and sprint speed help identify future defensive rebounders. In addition, after controlling for age, eFG% is a strong predictor of NBA defensive rebounding.
Collegiate block rate alone explains just over 40% of the variability in NBA block rates. The best model of NBA shot-blocking explains over 65% of the variability by crediting tall players who are good defensive rebounders and debiting heavier players.
The best model to explain NBA steal rate only looks at collegiate steal rate. This model only explains 33% of the variance in NBA steal rates, but it is difficult to improve on. Wingspan and assist rate are both significant predictors of NBA steal rate even after controlling for collegiate steal rate, but they don't improve the predictive power of the model.
So those are the best models for individual box-score statistics. I bet you want to see those models used to build projected box score numbers for player's in this draft don't you? Too bad. That is what I ultimately want to do, but I still have a lot of additional work to do before I can produce useful projected box scores.
Study 2: "Identifying Studs and Duds"
Most statistical draft projections strive to rank players in the draft, or give them a simpel composite score (like PER or Hoopus Score) that allows for comparison between drafts. This is a nice way to order prospects but it ignores two important issues in the draft decision. How can I distinguish the "safe" picks from the duds? and, if I am in the mood for a gamble, where should I put my chips?
I took the same data and variables used in the above study and applied them to predicting what kind of a career players will be having at age 25. Once again, I used stepwise AIC to find the best model to predict whether a player would still be on an NBA roster at 25 (minutes played > 0), whether he would be a legitimate "starter" at age 25 (Minutes played > 1,000 and WS48 > 0.1), and whether he would be a "star" at age 25 (Minutes played > 2,000 and WS48 > 0.15.)
The important variables that AIC pulled out are not terribly surprising. Production wise, scoring and passing without committing turnovers is very important. While in terms of physical features, being big (height and weight) and being capable of jumping high in the combine's maximum vertical drill are important predictors of NBA success. There is more to the story, but I am still trying to get these models ironed out and don't want to report details that are likely to change.
I would rather not report predictions for the current draft class until I have cleaned the models up. However... I don't think I will have more time to play with basketball stats for quite a while and the draft is coming up. So, here are the numbers. Take them with a big grain of salt.
This first column is the chance that the player will still be on an NBA roster when he turns 25. The second column is the chance that the player will play at least 1,000 minutes and record a WS48 of at least 0.1 (the NBA average) during the season he turns 25. The third column is the chance that the player will play at least 2,00 minutes and record a WS48 of at least 0.15 during the season he turns 25:
Note: The three big names on Hoopus are probably Barton, Crowder, and Miller. This model really likes both Crowder and Miller (though in different ways) and is pretty uninspired by Barton, but doesn't hate him.
Retrodictions with out of sample draft classes:
(BTW.. I don't know why Durant is in the wrong rookie class...)