redmen9194 wrote:I was told there would be no math...
Doge McDermott wrote:Congrats, your post convinced me to stop lurking and sign up.
I guarantee you overfit your model. That rsquare is way too high. What kind of regression did you use? And more importantly, what software did you use?
EDIT - Also, why did you exclude conferences that didn't get a bid?
Doge McDermott wrote:Congrats, your post convinced me to stop lurking and sign up.
I guarantee you overfit your model. That rsquare is way too high. What kind of regression did you use? And more importantly, what software did you use?
EDIT - Also, why did you exclude conferences that didn't get a bid?
Hall2012 wrote:No fancy tools, I just did this quickly using multiple linear regression (admittedly not the best for predictive analytics) in Excel. The only other capable software I have is Mintab 17, which I'm not the biggest fan of- maybe I just need to get more familiar with it. It's far from perfect, my mean absolute residual is 0.735, so there's a lot of +/-1, but it splits pretty evenly to make the average residual -0.00035. Hence the extremely high rsquare I guess?
I'm pretty new at this, but just wanted to play around and see what I got.
As far as excluding the 0 at large bid leagues, it was a mix of time/laziness and thinking that adding 20 zero bid leagues (which are generally the same leagues every year) each year regardless of win%, etc. wouldn't add much to my model (I could definitely be wrong about that). Mainly, one of my factors is conference size and with all of the realignment going on, I didn't feel like counting the number of teams in each conference each year (about an additional 20 conferences a year) when I didn't think it would add much.
Doge McDermott wrote:Hall2012 wrote:No fancy tools, I just did this quickly using multiple linear regression (admittedly not the best for predictive analytics) in Excel. The only other capable software I have is Mintab 17, which I'm not the biggest fan of- maybe I just need to get more familiar with it. It's far from perfect, my mean absolute residual is 0.735, so there's a lot of +/-1, but it splits pretty evenly to make the average residual -0.00035. Hence the extremely high rsquare I guess?
I'm pretty new at this, but just wanted to play around and see what I got.
As far as excluding the 0 at large bid leagues, it was a mix of time/laziness and thinking that adding 20 zero bid leagues (which are generally the same leagues every year) each year regardless of win%, etc. wouldn't add much to my model (I could definitely be wrong about that). Mainly, one of my factors is conference size and with all of the realignment going on, I didn't feel like counting the number of teams in each conference each year (about an additional 20 conferences a year) when I didn't think it would add much.
Do yourself a favor and download R. I linked the R-Studio. It's probably the easiest interface to learn R. R is free, it's fairly easy to learn, and it's what I use in grad school.
I don't want to get too deep into the weeds, but try running a cross-validated model or a lasso regression to fix your overfit.
What would happen if you created flags for the auto-bid teams? I imagine there should be some correlation between number of wins and the auto-bid.
Finally, there might be some value to including the 0-bid conferences. It's better to have all information as opposed to just the stuff that confirms your hypothesis. I get it though. Cleaning data sucks.
Granted, all of this advice is made without looking at any of the data. Sounds like an interesting project.
Return to Big East basketball message board
Users browsing this forum: Bing [Bot] and 20 guests