Features are data attributes or properties that describe certain dimensions or individual characteristics of a dataset, with which we can perform analysis (e.g, using machine learning models for prediction) or perform operations (e.g., using rules to take actions) in order to achieve certain tasks such as fraud detection.
Feature Engineering refers to using domain knowledge to extract useful features from raw data in order to generate better rules or build more accurate models for your use case.
In the past, feature engineering has traditionally been complex and time-consuming—the creation of high-quality features can be a tedious, multi-step process like the following:
- Collecting or generating high-quality raw data
- Consulting experts to understand what kinds of information are most important to solve problems for a given use case
- Writing code to generate a high quality feature
- Writing scripts to test the feature correctness on sampled data
- Repeat the process -- for every single new feature you want to create
- Writing production scripts or code so that you can generate all the features from the raw data you have collected in batch or in real time
- Optimizing the performance of your code and scripts to make sure your feature generation doesn’t take too long to finish or doesn’t take too much memory (which makes the job out-of-memory and fail)
- Deploying your code and scripts in production
- Monitoring the production job status to make sure everything is working as expected and your features work as expected for the downstream rules or machine learning models.
- If there is any need to add or modify features, repeat the whole above process on an ongoing basis
Additionally, analysts and data scientists spend much of their time trying to move data between different formats and platforms just to get their final results into a usable form. The need for an all-in-one ecosystem for feature engineering has never been greater, and DataVisor’s Feature Platform solves this issue and supports all the above 10 steps.
DataVisor’s Feature Platform is both a platform and an ecosystem, allowing users to more efficiently manage their data. In the risk and fraud domain, it allows fraud analysts and data scientists to focus on their main tasks of developing detection and prevention strategies instead of spending too much time on generating features, which is often the most time consuming step. Feature Platform supports all of the following:
- The ability to use DataVisor out-of-the-box features that leverage our domain expertise for different risk and fraud use cases
- A platform for analysts and data scientists to build, test, and deploy in production both simple and complex features by using either an intuitive UI or by programming code snippets through UI as well.
- User interfaces that allow users to easily generate feature templates and put them into feature packages so that other users can share and reuse those packages, instead of recreating all the features from scratch again
- The ability to generate features in batch on a regular basis, or in real time, low latency, high QPS production environment
- The ability to move features between different production environments for easy maintenance.
From top to bottom, Feature Platform is optimized to enable and support the “Centralized Intelligence” that we discussed in this article.