Home     Definition    History    Examples    Medicine    Agriculture    Industrial    Environmental     Regulation     Learning        

Feature engineering


Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself.

A feature is an attribute or property shared by all of the independent units on which analysis or prediction is to be done. Any attribute could be a feature, as long as it is useful to the model.

A feature could be strongly relevant (i.e., the feature has information that doesn't exist in any other feature), relevant, weakly relevant (some information that other features include) or irrelevant. Even if some features are irrelevant, having too many is better than missing those that are important. Feature selection can be used to prevent overfitting.

Automation of feature engineering is a research topic that dates back to at least the late 1990s. The academic literature on the topic can be roughly separated into two strings: First, Multi-relational decision tree learning (MRDTL), which uses a supervised algorithm that is similar to a decision tree. Second, more recent approaches, like Deep Feature Synthesis, which use simpler methods.

which results in many redundant operations. These redundancies can be reduced by using tricks such as tuple id propagation. More recently, it has been demonstrated that the efficiency can be increased further by using incremental updates, which completely eliminates redundancies.

In 2015, researchers at MIT presented the Deep Feature Synthesis algorithm and demonstrated its effectiveness in online data science competitions where it beat 615 of 906 human teams. Deep Feature Synthesis is available as an open source library called Featuretools. That work was followed by other researchers including IBM's OneBM and Berkeley's ExploreKit. The researchers at IBM stated that feature engineering automation "helps data scientists reduce data exploration time allowing them to try and error many ideas in short time. On the other hand, it enables non-experts, who are not familiar with data science, to quickly extract value from their data with a little effort, time, and cost.