Excel at Data Mining – Creating a Classification Matrix
- How To Data Mining Add In For Excel 2016 Mac How To Sort And Filter
- How To Data Mining Add In For Excel 2016 Mac How To Merge Cells
By using a data mining add-in to Excel, provided by Microsoft, you can start planning for future growth. Add to that, a PDF to Excel converter to help you collect all of that data from the various sources and convert the information to a spreadsheet, and you are ready to go.
Today, I’m going to show you how to create and read a Classification Matrix in less than 5 minutes with the Microsoft Excel data mining add-in. The assumption of this post is that you have already installed the data mining add-in for Excel and configured the add-in to be pointed at an instance of SQL Server with Analysis Services to which you have access rights.
For the example, we will be using a tutorial spreadsheet that can be found on Codeplex here. You also need to attach the AdventureworksDW2012 data file to SQL Server. The file can be downloaded here.
- With the Analytic Solver® add-in, created by Frontline Systems, developers of Solver in Microsoft Excel, you can create and solve Monte Carlo simulation and optimization models in your Excel workbook. This add-in can be used alone, but it’s designed to work with Frontline’s Analytic Solver Data Mining add-in, offering powerful forecasting.
- That's why I have added a Data Mining Wizard to QI Macros Add in for Excel. It automates the creation of a PivotTable and all of the charts that can be derived from the data. It will examine your data, summarize it and create control charts and Pareto charts automatically! Here's how the Data Mining Wizard Works Data Mining Wizard.
- . In the field Operator cell select “Greater than or equal to.” Then type 50000 in the “Value” cell. Click “Edit Fields” to add additional fields to the results table.
Our goal is to create a Classification Matrix based on a mining structure with all of its associated models that we have created previously.
Read below if you are unable to view the video walkthrough.
Step 1: Creating the Profit Chart
We go to the DATA MINING tab, the Accuracy and Validation group, and select the Classification Matrix icon. From here, select the Mining Structure that you wish to analyze. In this case we will be analyzing a Naïve Bayes Model and a Logistic Regression Model that we previously created using Adventureworks2012DW. Click Next. On the Specify Column to Predict screen, make sure that BikeBuyer, Show results as percentage, and Show results as count are selected.
Click Next and then Finish. And, that is how easy it is to actually create a Classification Matrix!
Step 2: Reading the Classification Matrix
Before we start looking at the Classification Matrix, let’s take a moment to think about how we can analyze our models using test data that has actual results in it. We’ll take customer A and have our model look at her information. It predicts that she WILL purchase a bike. Next, we can see what she actually did in the dataset. We have two possible outcomes: she either bought a bike and validated our model or she did not buy a bike and our model “missed” on her. The first case we call a true positive, and the second case we call false positive. Thinking about a NO prediction will lead to similar categories of true negative and false negative. An easy way to remember these terms is that first word tells you if you actual and prediction match and the second word tells you the prediction. True negative means that actual matched the prediction, and the prediction was NOT BUYING. False positive means that the actual does not match the prediction, and the prediction was a BUYING. Note that positive and negative depends on the definition of your model. We could have easily tried to predict people who would not buy bikes, and this would reverse all our definitions because NOT BUYING becomes the positive and BUYING becomes the negative.
So, now we realize that we have four categories (true positive, false positives, true negatives, false negatives) that can be summarized as a 2 x 2 matrix. This is exactly what we did when we created the Classification Matrix.
Starting at the top, we see an overview breakdown of how well the models performed against the test data. Keep in mind that Total Correct tells how many the model was right on for both positive and negative results.
Then below that, we see two matrices for the Bayes Model. One of them has percentages and the other has counts. Across the top of our matrix, we have our actuals and along the side we have our predictions. At each intersection of the matrix, we have a number that tells us how many of the test results fell into that bucket. At the intersection of 0 (actual) with 0 on the percentage matrix, we see 61.91%. This tells us that the model was accurately able to identify 61.91% of the people who ended not purchasing a bike (true negatives). Moving down one spot, we see that the remaining 38.09% were false negatives in that we predicted customers would not buy but they ended up buying a bike. The counts matrix shows the actual raw numbers used to calculate the percentages in the percentage matrix.
And that is how you create and read a Classification Matrix in less than five minutes using the Excel Data Mining Add in.
How To Data Mining Add In For Excel 2016 Mac How To Sort And Filter
Happy digging!
Billy Decker
Billy Decker is a consultant at StatSlice Systems. He graduated with a dual degree in Physics and Mathematics from the University of Texas at Austin and received his Masters Degree in Physics from the University of Texas at Arlington. He previously worked for Global Technical Services as a Senior Training Analyst and Bell Hellicopter as an Instructional Designer. His technical experience includes, but is not limited to, SQL, SAP, Business Objects, QlikView, and Sharepoint.
Subscribe
You can subscribe to our RSS feed.
How To Data Mining Add In For Excel 2016 Mac How To Merge Cells
- Notice: JavaScript is required for this content.
- Categories
- Blog (97)
- Big Data (13)
- Big SQL (4)
- Redshift (2)
- Blog Authors (49)
- Bernard Wehbe – Blog (8)
- Billy Decker – Blog (13)
- Brett Neuman – Blog (7)
- James Vogel – Blog (5)
- Jared Decker – Blog (6)
- Justin Taylor – Blog (10)
- Business Analytics (36)
- Business Intelligence (35)
- Dashboarding (11)
- Data Mining (10)
- Data Warehousing (11)
- ETL (6)
- IT Management (8)
- Mobile BI (1)
- OLAP (2)
- Products (24)
- Microsoft (6)
- QlikView (5)
- Roambi (3)
- Tableau (11)
- News (14)
- Jobs (5)
- Partnerships (2)
- Our Work (41)
- Case Studies (6)
- Newsletters (9)
- Publications (14)
- Webinars (7)
- White Papers (7)
- Training (8)
- information (4)
- Jared Decker (4)
- registration (4)
- Uncategorized (1)
- Blog (97)