5.1 Model Review
When the model’s status displays “Ready For Review” under the Model Management page, you can view the summary of the model results and begin the review process. Particularly for scenarios with a lack of labels, reviewing and labeling individual clusters will help estimate the model performance and provide new insights about tuning out false positives.
5.1.1 Summary of Model Results
By clicking View Model Results in the drop-down menu on the right-hand side of the model, you can jump to the Detection Summary page. Here, you can get detection stats such as the number and percentage of detected users, overall precision, overall recall, and more.
The detection stats will dynamically change if the score threshold is increased from 0. For example, take 0.8 as shown below. A higher threshold usually results in improved precision and lower recall.
This page also visualizes the model performance via precision-score and recall-score graphs.
5.1.2 Cluster Review
Under the Model Management page, click on the Review and Label button to go to the cluster review page.
The Cluster Review page displays the total sampled clusters and several review stats. While these clusters are a subset of the overall detected clusters, they are sampled to represent each feature subspace for review purposes. Here, you can sort sample clusters by Score, FP Rate (false positive rate), TP Rate (true positive rate), Similar Users (number of detected users from the same subspace), and Users (cluster size). The Reasons column provides detailed detection reasons for a cluster. The full Reasons list can be opened by clicking on the small arrow on the rightmost side of the page. There is also a generic search bar that can search the cluster level content (e.g. cluster ID, reason string) and return matched records.
User can label a cluster as FP, TP, or Uncertain by clicking on the green, red or yellow button. To further view the users belonging to each cluster, click on the blue arrow to go to the Cluster Detail page (details in Section 5.1.4).
To help you achieve higher review efficiency, we have developed the review filter functionality. By clicking on the ADD FILTER button, you may create nested filters to fetch the clusters you want to review the most. We will focus on this part in the next subsection. Above the cluster FILTER, there are three useful links:
- MODEL CONFIGURATION: Model configuration gives you quick access to the final configuration used to run the current model.
- DATA DISTRIBUTION: Data distribution is only available if the smart recommendation has been run.
- COUNTERS: Counters will provide important insights about various feature subspaces.
Counters of both selected features and high priority features are provided in two charts. The higher the count, the more users are detected from the current feature. Take the feature “high_fraud_rate_client_id” below as an example. This feature helped detect 168 users and uniquely detected 87 users. However, if we remove this feature, we will miss 87 detections. Features with the highest counts (especially highest only counts) may be noisy and deserve more thorough checks. High priority features have similar counters. Each chart can also be viewed in percentage mode by toggling the # to % button on the top right of the window).
5.1.3 Cluster Review Filter
To achieve higher review efficiency, you may use FILTER CLUSTERS to first review clusters of interest. Click on ADD FILTER and then configure your filter criteria.
Currently 6 types of filtering targets are supported:
- Selected features
- High priority features
- Similarity thresholds
- Cluster size
- Cluster score
- Number of selected features
After selecting the target you want to filter, enter the feature, operator and corresponding value you want to filter and click the + button to add the filter. Click on the FILTER button to apply all the filter criteria, and the cluster list below will be updated accordingly. You can keep adding filters to further restrict the returned clusters. In the example below, we create a filter for a selected feature (“high_fraud_rate_ip”) with the value true. After applying this filter, the total sampled clusters drops to 707 (from 2,717 before).
5.1.4 Cluster Detail Page
We can go to the Cluster Details page by clicking on the blue right arrow shown above. The Cluster Details page contains basic information about the cluster on the top. More importantly, it shows a detailed user table at the bottom that displays feature values for every detected user within the cluster. You can click the Edit Columns button to select features of interest or drag feature columns into a desired order. In addition to labeling the entire cluster, you can also label individual users with more granularity.
You can click on a user to visualize its similarity to other users in the cluster. The shared feature values will be highlighted in yellow.
Next to the Cluster Details tab is the Similar Clusters tab. This tab will show you other sampled clusters from the same feature subspace as the current cluster; the Similar Users count is the same for all clusters. Typically, clusters from the same feature subspace share similar performance because the detection reasons are highly similar. This provides you a quick way to label batches of similar clusters at the same time.
5.1.5 User Detail Page
In order to view user details, click on the desired User ID to show the User Details page.
The User Details page will list all events of a user, as well as all field values of each event. You can adjust the columns to show by EDIT COLUMNS.
5.1.6 Download Cluster
You can download multiple clusters by selecting the clusters in the Review and Label Clusters page.
Click Download Cluster. Choose the file type (CSV/JSON) you would like to save the clusters as. Optionally you may leave a comment of your Selection Criteria for later review.
You can check your download histories by clicking HISTORIC DOWNLOAD.
You can see your historic Selection Criteria, and download the desired clusters again by clicking the Download Link.
5.1.7 Model Postfilter
If you want to postfilter your model to remove selective FP clusters, you can configure postfilter rules by clicking on the RULES WORKSPACE button and use these rules to re-run your model. Clusters and users that match ANY of your rule conditions will be filtered out.
Currently, we support three major rule types:
- Filter Clusters: Within the Filter Clusters section, there are five subtypes.
- Cluster Detected Member Count (cluster size): A cluster too small or too large may be a false positive.
- Cluster Score: A common rule is to filter out scores that are less than a user-defined threshold.
- Cluster Feature Share: This is an interesting rule that inspects the value share percentage of a selected feature. In the example below, clusters where more than 80% users share the same feature “last_name” will be filtered out.
- Cluster Feature Count: A common rule is to filter out clusters with a feature count that is less than a user-defined threshold. A lower feature count indicates a weaker feature subspace and detection reason.
- FP Rate: A common rule is to filter out clusters with high FP rates.
Filter Users: In the Filter Users section, you can create rules on the feature level to match users. The matched users will be excluded from the detection result later.
Adjust Feature Score: In the Adjust Feature Score section, there are two subtypes.
- Feature Correlation: You can define two correlated features that may not have been specified earlier during model configuration. The system will recompute cluster scores after taking this correlation into consideration. Sometimes, some clusters will be removed if the recomputed feature space is not not strong enough.
- Feature Ignore: You can define a feature that you want to be ignored entirely (in other words, removed from all feature subspaces). Usually, this is a stronger rule than the feature correlation rule.
After determining your postfilter rules, you can choose to either save directly or test the added rules. After clicking the SAVE button, the added rule will show up under the Postfilter Rules section. You can manage a set of created rules by using CREATE RULESET. If you need to test a rule, click the TEST button below to start the test. When the test is over, the results will be presented in the TEST RESULTS section. Here you can see the approximate number of filtered users and group information.
After consolidating your rules, you may click on RE-RUN MODEL which will trigger a postfilter job in the backend.
Now in the Model Management page, you can track the status of the post filtered model.
5.2 Campaign Analysis
Cluster review page described in the previous section allows you to review detected clusters in different feature spaces. Because a user may get detected in multiple clusters in different spaces, you may want to have a holistic and deduped view of all detected users and their connections. By running a campaign analysis, you will obtain a report where clusters from different spaces are further linked and merged together in the form of a “campaign”, based on their similarities. In this report, each detected user corresponds to at most one campaign.
When the model is in the Ready for Review stage, you can run campaign analysis in the drop-down menu on the right hand side of the model.
After clicking on the button, the Run Campaign Analysis page will pop up, which allows you to fill in the campaign linkage factor. You can check the notes on the page for a more detailed explanation of the factor.
Then, you can click on the Save button and the campaign analysis will start to run, as shown in the orange circle in the following figure.
In general, it takes less than 15 minutes for a campaign report to be fully generated. When you see a downward arrow shown below, you can click for download. You may rerun campaign analysis with a different linkage factor. New result will overwrite the previous record.
5.3 Auto Tuning
When the model is in Ready for Review stage, you can run Auto Tune Model in the drop-down menu on the right hand side of the model.
5.4 Model Comparison
You can compare the detection results between any two models ready for review through the Compare Model Results interface. This interface enables users to quickly find iteration directions to improve model performance by comparing the models’ detection result differences under various thresholds.
You can access the Model Comparison page by clicking on the Models from the top menu bar.
After selecting the two models you want to compare, you can adjust the Score Threshold for Model to filter out the clusters with scores under your user-defined threshold.
Then, click on Compare. The interface will display the total clusters, the number of unique clusters of the selected models, and the number of overlap clusters. You can also view user level differences.
5.5 Model Iteration
To start from an existing model and make more adjustments, you can create a new model and use the import functionality. To do that, click on Feature Engineering from the Features top menu bar, choose the dataset you are working with, and click on CREATE MODEL.
Put in the input arguments required, then at the bottom of the model creation page, choose the model name you would like to copy from and click PROCEED TO FEATURE SELECTION. An alternative way of importing an existing model is to click on the Import Configuration within the Feature Selection page. However, keep in mind that the imported model will overwrite the current model configurations.
You can also clone the configuration from an existing model and do modifications on top of the existing model to achieve better results.
To clone an existing model, you can click on the Clone Model option in the drop-down menu on the right hand side of the model.
After clicking on the button, a page similar to the Create Model page will appear, and the clone model will have the same configuration as the model being cloned.
This document is an introduction to the core algorithm knowledge, basic workflow, and all functions involved in dCube modeling. If you have any follow-up questions, please feel free to contact your technical account manager. We are pleased to accompany you through the entire process of dCube modeling. We hope you have an enjoyable and smooth modeling experience! ——— All Members of DataVisor