Projected # of NCAA Tourney Teams

The home for Big East hoops

Projected # of NCAA Tourney Teams

Postby Hall2012 » Tue Nov 25, 2014 12:49 pm

So since I'm currently studying analytics, I decided to have some fun and create a multiple regression model to project the number of at-large bids a conference will get for the NCAA Tournament.

Based on the model, if non-conference play ended today, the Big East would be projected to get 9 at-large NCAA Bids! (9.319939757 to be exact)

Unfortunately, non-conference play does not end today and that number pretty much hinges on us keeping up a .9394 winning % as a conference. But it still shows just how good we've been as a whole to this point.

For any stat-heads out there, I've got some more info on my results below. Feel free to critique and offer any suggestions to try to improve on the accuracy.
____________________________________________________________________________________________________________________________________

My result was this simple (or not so simple equation):
Teams in Touney = 49.83465(win%^2)-62.174(win%)+50.88886(NonConRPI)+.062851(#Teams^2)-1.23931(#Teams)

Win%= conference winning percentage in non-conference play
NonConRPI= Conference RPI excluding conference play (from CBS RPI)
#Teams= # of Teams in the conference

I used data from every conference with at least 1 at large bid over the past 5 years, and on only 2 instances did the model's projection on a conference miss by more than 1 bid (rounding to integers).

R-Square: .95151
F value: 12.2907
F significance: 159E-30
MAE: .73466
MSE: .53973

Some sources of error include likely at-large bid candidates reaching the tournament by way of automatic bids, the change in the number of available at-large bids due to the increase in field size to 68, bubble teams playing their way out of the tournament during conference play, and consideration for the human element in the actual selection process.
Last edited by Hall2012 on Tue Nov 25, 2014 3:56 pm, edited 1 time in total.
Seton Hall Pirates
Big East Tournament Champions: 1991, 1993, 2016
Big East Regular Season Champions: 1992, 1993, 2020
Hall2012
 
Posts: 3462
Joined: Mon Sep 30, 2013 3:04 pm

Projected # of NCAA Tourney Teams

Sponsor

Sponsor
 

Re: Projected # of NCAA Tourney Teams

Postby Doge McDermott » Tue Nov 25, 2014 1:36 pm

Congrats, your post convinced me to stop lurking and sign up.

I guarantee you overfit your model. That rsquare is way too high. What kind of regression did you use? And more importantly, what software did you use?

EDIT - Also, why did you exclude conferences that didn't get a bid?
Last edited by Doge McDermott on Tue Nov 25, 2014 2:11 pm, edited 1 time in total.
User avatar
Doge McDermott
 
Posts: 179
Joined: Tue Nov 25, 2014 1:27 pm

Re: Projected # of NCAA Tourney Teams

Postby redmen9194 » Tue Nov 25, 2014 2:07 pm

I was told there would be no math...
User avatar
redmen9194
 
Posts: 1449
Joined: Thu Feb 28, 2013 7:46 am

Re: Projected # of NCAA Tourney Teams

Postby HoosierPal » Tue Nov 25, 2014 2:38 pm

redmen9194 wrote:I was told there would be no math...


Best post of the day!! :lol:
HoosierPal
 
Posts: 1171
Joined: Thu Jul 04, 2013 8:42 am

Re: Projected # of NCAA Tourney Teams

Postby Chalmers0 » Tue Nov 25, 2014 2:41 pm

Doge McDermott wrote:Congrats, your post convinced me to stop lurking and sign up.

I guarantee you overfit your model. That rsquare is way too high. What kind of regression did you use? And more importantly, what software did you use?

EDIT - Also, why did you exclude conferences that didn't get a bid?


Was going to say the same thing regarding the overfit and rsquare.

Also very curious to hear the answer to the regression type/software used.
Chalmers0
 
Posts: 350
Joined: Tue Jul 16, 2013 7:38 am

Re: Projected # of NCAA Tourney Teams

Postby Hall2012 » Tue Nov 25, 2014 3:13 pm

Doge McDermott wrote:Congrats, your post convinced me to stop lurking and sign up.

I guarantee you overfit your model. That rsquare is way too high. What kind of regression did you use? And more importantly, what software did you use?

EDIT - Also, why did you exclude conferences that didn't get a bid?



No fancy tools, I just did this quickly using multiple linear regression (admittedly not the best for predictive analytics) in Excel. The only other capable software I have is Mintab 17, which I'm not the biggest fan of- maybe I just need to get more familiar with it. It's far from perfect, my mean absolute residual is 0.735, so there's a lot of +/-1, but it splits pretty evenly to make the average residual -0.00035. Hence the extremely high rsquare I guess?

I'm pretty new at this, but just wanted to play around and see what I got.

As far as excluding the 0 at large bid leagues, it was a mix of time/laziness and thinking that adding 20 zero bid leagues (which are generally the same leagues every year) each year regardless of win%, etc. wouldn't add much to my model (I could definitely be wrong about that). Mainly, one of my factors is conference size and with all of the realignment going on, I didn't feel like counting the number of teams in each conference each year (about an additional 20 conferences a year) when I didn't think it would add much.
Seton Hall Pirates
Big East Tournament Champions: 1991, 1993, 2016
Big East Regular Season Champions: 1992, 1993, 2020
Hall2012
 
Posts: 3462
Joined: Mon Sep 30, 2013 3:04 pm

Re: Projected # of NCAA Tourney Teams

Postby FormulaX » Tue Nov 25, 2014 3:17 pm

Image
FormulaX
 
Posts: 200
Joined: Fri Sep 20, 2013 10:54 am

Re: Projected # of NCAA Tourney Teams

Postby Doge McDermott » Tue Nov 25, 2014 4:01 pm

Hall2012 wrote:No fancy tools, I just did this quickly using multiple linear regression (admittedly not the best for predictive analytics) in Excel. The only other capable software I have is Mintab 17, which I'm not the biggest fan of- maybe I just need to get more familiar with it. It's far from perfect, my mean absolute residual is 0.735, so there's a lot of +/-1, but it splits pretty evenly to make the average residual -0.00035. Hence the extremely high rsquare I guess?

I'm pretty new at this, but just wanted to play around and see what I got.

As far as excluding the 0 at large bid leagues, it was a mix of time/laziness and thinking that adding 20 zero bid leagues (which are generally the same leagues every year) each year regardless of win%, etc. wouldn't add much to my model (I could definitely be wrong about that). Mainly, one of my factors is conference size and with all of the realignment going on, I didn't feel like counting the number of teams in each conference each year (about an additional 20 conferences a year) when I didn't think it would add much.


Do yourself a favor and download R. I linked the R-Studio. It's probably the easiest interface to learn R. R is free, it's fairly easy to learn, and it's what I use in grad school.

I don't want to get too deep into the weeds, but try running a cross-validated model or a lasso regression to fix your overfit.

What would happen if you created flags for the auto-bid teams? I imagine there should be some correlation between number of wins and the auto-bid.

Finally, there might be some value to including the 0-bid conferences. It's better to have all information as opposed to just the stuff that confirms your hypothesis. I get it though. Cleaning data sucks.

Granted, all of this advice is made without looking at any of the data. Sounds like an interesting project.
User avatar
Doge McDermott
 
Posts: 179
Joined: Tue Nov 25, 2014 1:27 pm

Re: Projected # of NCAA Tourney Teams

Postby DudeAnon » Tue Nov 25, 2014 5:34 pm

I remember having to do a project my senior year and had to write a program that could be used in a business situation. I waited till the last minute than decided to analyze international aid based on a bunch of different parameters. I had taken 2 statistics classes (very hard stuff) so I knew the basics. I threw up a bunch of regression tests that were completely made up and none were the wiser.
Xavier

2018 Big East Champs
User avatar
DudeAnon
 
Posts: 3015
Joined: Thu Mar 07, 2013 12:52 pm

Re: Projected # of NCAA Tourney Teams

Postby Hall2012 » Tue Nov 25, 2014 5:49 pm

Doge McDermott wrote:
Hall2012 wrote:No fancy tools, I just did this quickly using multiple linear regression (admittedly not the best for predictive analytics) in Excel. The only other capable software I have is Mintab 17, which I'm not the biggest fan of- maybe I just need to get more familiar with it. It's far from perfect, my mean absolute residual is 0.735, so there's a lot of +/-1, but it splits pretty evenly to make the average residual -0.00035. Hence the extremely high rsquare I guess?

I'm pretty new at this, but just wanted to play around and see what I got.

As far as excluding the 0 at large bid leagues, it was a mix of time/laziness and thinking that adding 20 zero bid leagues (which are generally the same leagues every year) each year regardless of win%, etc. wouldn't add much to my model (I could definitely be wrong about that). Mainly, one of my factors is conference size and with all of the realignment going on, I didn't feel like counting the number of teams in each conference each year (about an additional 20 conferences a year) when I didn't think it would add much.


Do yourself a favor and download R. I linked the R-Studio. It's probably the easiest interface to learn R. R is free, it's fairly easy to learn, and it's what I use in grad school.

I don't want to get too deep into the weeds, but try running a cross-validated model or a lasso regression to fix your overfit.

What would happen if you created flags for the auto-bid teams? I imagine there should be some correlation between number of wins and the auto-bid.

Finally, there might be some value to including the 0-bid conferences. It's better to have all information as opposed to just the stuff that confirms your hypothesis. I get it though. Cleaning data sucks.

Granted, all of this advice is made without looking at any of the data. Sounds like an interesting project.


Thanks for the advice. Is this something you do for your career? I'm in my first semester in a Business Analytics grad program, so I'm still pretty inexperienced with this, but I was really playing around with these numbers to see what I could learn about them. Any advice/constructive criticism is very much appreciated!
Seton Hall Pirates
Big East Tournament Champions: 1991, 1993, 2016
Big East Regular Season Champions: 1992, 1993, 2020
Hall2012
 
Posts: 3462
Joined: Mon Sep 30, 2013 3:04 pm

Next

Return to Big East basketball message board

Who is online

Users browsing this forum: Bing [Bot] and 20 guests