What is a drawback to performing data cleansing (imputation, transformations, etc.) on raw data prior to partitioning the data for honest assessment as opposed to performing the data cleansing after partitioning the data?

What is a drawback to performing data cleansing (imputation, transformations, etc.) on raw data prior to partitioning the data for honest assessment as opposed to performing the data cleansing after partitioning the data? Question: What is a drawback to performing data cleansing (imputation, transformations, etc.) on raw data prior to partitioning the data for honest … Read more

What values are not affected by this oversampling?

A confusion matrix is created for data that were oversampled due to a rare target. What values are not affected by this oversampling? Question: What values are not affected by this oversampling? Options: Sensitivity and PV+ Specificity and PV- PV+ and PV- Sensitivity and Specificity Correct Answer The Correct Answer for this Question is Sensitivity … Read more

Which of the two models should be selected and why?

Refer to the exhibit:The plots represent two models, A and B, being fit to the same two data sets, training and validation.Model A is 90.5% accurate at distinguishing blue from red on the training data and 75.5% accurate at doing the same on validation data. Model B is 83% accurate at distinguishing blue from red … Read more

Which statistic, calculated from a validation sample, can help decide which model to use for prediction of a binary target variable?

Which statistic, calculated from a validation sample, can help decide which model to use for prediction of a binary target variable? Question: Which statistic, calculated from a validation sample, can help decide which model to use for prediction of a binary target variable? Options: Adjusted R Square Mallow’s Cp Chi Square Average Squared Error Correct … Read more

What is an acceptable division between training, validation, and testing data?

In order to perform honest assessment on a predictive model, what is an acceptable division between training, validation, and testing data? Question: What is an acceptable division between training, validation, and testing data? Options: Training: 50% Validation: 0% Testing: 50% Training: 100% Validation: 0% Testing: 0% Training: 0% Validation: 100% Testing: 0% Training: 50% Validation: … Read more

What is the best data to use for model assessment?

The total modeling data has been split into training, validation, and test data. What is the best data to use for model assessment? Question: What is the best data to use for model assessment? Options: Training data Total data Test data Validation data Correct Answer The Correct Answer for this Question is Validation data

Including redundant input variables in a regression model can:

Including redundant input variables in a regression model can: Question: Including redundant input variables in a regression model can: Options: Stabilize parameter estimates and increase the risk of overfitting. Destabilize parameter estimates and increase the risk of overfitting. Stabilize parameter estimates and decrease the risk of overfitting. Destabilize parameter estimates and decrease the risk of … Read more

What problem does this present?

One common approach for predicting rare events in the LOGISTIC procedure is to build a model that disproportionately over-re presents those cases with an event occurring (e.g. a 50-50 event/non-event split).What problem does this present? Question: What problem does this present? Options: All parameter estimates are biased. Only the intercept estimate is biased. Only the … Read more

The selection criterion used in the forward selection method in the REG procedure is:

The selection criterion used in the forward selection method in the REG procedure is: well answered. Question: The selection criterion used in the forward selection method in the REG procedure is: Options: Adjusted R-Square SLE Mallows’ Cp AIC Correct Answer The Correct Answer for this Question is SLE