Mar
17
2009
How do you debug a program that you can’t see? This is the problem faced by users interacting with the increasing number of applications utilizing machine-learning components to train themselves to a user’s inclinations. These programs consist of an instance of a machine-learning classifier that has been trained on data from a particular user and either resides on this user’s machine, or is designed solely for interaction with this user over a network. Common examples include email filtering software and recommendation systems, such as the one utilized by Amazon.com. The machine-learned program itself is accessed through a more traditional program, such as an e-mail application, which uses the learned program to decide how incoming messages should be categorized.
Our goal is to create a visualization of the learned program that a user may interact with, allowing the user to both understand why the program makes each of its decisions, and how the program can be corrected when it makes a faulty decision. Our audience consists of end users who neither have knowledge of formal software debugging techniques nor understand how machine-learning systems operate. These are people who use their computers for work or leisure, and are not interested in spending anything more than cursory time and effort to learn, e.g., how to improve the accuracy of their SPAM filter.
One of the most powerful machine-learning systems used today are Conditional Random Fields (CRFs). These systems excel at complex, sequential tasks such as natural language processing, and thus find themselves at the heart of many machine-learned programs. We used the logic for a CRF as the data set for our visualization. This logic includes a set of features, such as words, phrases, and other identifiable aspects of data which is being run through the learned program, as well as the set of numerical values each feature uses to determine its importance to each available category.
The data used to create the visualization is a transcript of a user study. This transcript consists of the words and actions of an end user debugging a spreadsheet in Microsoft Excel. Each sentence of the transcript is assigned to one of four categories (Seeking information, information gained, information lost, or none), which makes analysis of the transcript by researchers easier. Our visualization explains the logic the learned program might use to categorize each sentence of the transcript; future work would include allowing the user to adjust this logic when it results in poor classifications. This release is not connected to an actual classifier. The displayed explanations are randomly generated, but provide a useful idea of what a functional implementation would look like.
Transcript Viewer source code [XCode 3.1 project, requires Mac OS X 10.5 or higher]

A screenshot of our prototype Transcript Viewer
leave a comment | topics: auto-coding, code, machine-learned programs, school, visualizations | posted in Personal, academia
Dec
10
2008

Screenshot
Background
The purpose of this project is to visually explain a machine learning classifier’s logic to an end user. Additionally, the user must be able to explain back to the classifier when its logic is faulty and how it should be fixed.
Implementation
This project was designed in C++ using OpenGL. I used GLUT primitives such as glutSolidCube() to create the basic components, then reshaped and arranged them using OpenGL transformations. In order to create the translucent regions, I needed to enable lighting and shading. Blending was then used to create the illusion that each word’s bar is inside one or more of the translucent boxes. The text was created using glutStrokeCharater() calls; regular bitmap fonts were not an option because I wanted to rotate the text beneath the word boxes. The resulting words suffered from severe aliasing problems, so I used the OpenGL accumulation buffer to enable 8x anti-aliasing.
Usage
While this project is not actually hooked up to a classifier, the word importance can still be manipulated by dragging the bars up or down into the desired regions. There are also controls in a secondary window for rotating the view or panning right and left.
Room for Improvement
Ideally, this explanation would be interacting with a classifier; changes made to word importance would then be instantly viewable in the application when it re-classifies based upon the user’s adjustments. I would also have liked to make it possible to zoom in and out so that the user can fit exactly as many words on-screen as she would like. Finally, the blending of the word boxes with the translucent boxes isn’t entirely right–the word boxes should slightly change color depending on which region they are.
Downloads
Source code (for Visual Studio 2008)
Windows Binary (Requires OpenGL, GLUT and GLUI libraries)
leave a comment | topics: code, end-user debugging, explanations, machine-learned programs, opengl, school, visualizations | posted in Coding, academia
Dec
4
2008
Yay! Thanks to some help from Will, I’ve finally finished my 550 term project, a whole 18 hours before the due date! Since this was a graphics course, it isn’t actually hooked up to a naive bayes classifier backend (unlike the 2D version we did over the summer). It is, however, infinitely more enjoyable to use than the previous prototype, mostly because you can rotate the view around and zoom down the list of words.

3D naive bayes classifier visualization
2 comments | topics: code, opengl, school, visualizations | posted in Coding, academia
Dec
1
2008
After another day of coaxing OpenGL to play nice, I’ve managed to make some progress on my final project. The goal is to create a 3D version of the 2D visualization my research group designed for the logic underlying a Bayesian classifier. So far I’ve got it displaying blocks representing the importance of each word to the classifier (only using three words at the moment, but I’d like to scale it up so that you can just zoom down a list of hundreds), and translucent regions denoting whether a word is required, forbidden, or unimportant to the classifier’s decision. The blending isn’t quite right for that last part, but good enough to move on. Now I need to make it interactive, so that the user can adjust the importance of each word.
Also, Ted and Judy dropped off an electric oil heater when they passed through town today. My apartment’s furnace does a great job of heating the family room, but that’s about it. Now maybe my bedroom won’t be freezing each morning! Sadly they were just passing through while I was on campus; I want to take them out to dinner sometime before heading back to Detroit for the holidays.
leave a comment | topics: academia, apartment, end-user debugging, explanations, machine-learned programs, opengl, school, visualizations | posted in Coding, Personal, academia