Machine learning terms#
- Categorical feature#
Synonyms: discrete feature, qualitative feature
Examples: predict what type of object is on the picture (a dog or a cat?), predict whether or not an email is spam
Example: find big star clusters in the space images
- Continuous feature#
Synonyms: quantitative feature, numerical feature
Examples: a person’s height, the price of the house
- CSV file#
A comma-separated values file (csv) is a type of a text file. Each line in a CSV file is a record containing fields that are separated by the delimiter. Fields can be of a numerical or a text format. Text usually refers to categorical values. By default, the delimiter is a comma, but, generally, it can be any character. For more details, see.
A collection of observations.
- Dimensionality reduction#
A problem of transforming a set of feature vectors from a high-dimensional space into a low-dimensional space while retaining meaningful properties of the original feature vectors.
A particular property or quality of a real object or an event. Has a defined type and domain. In machine learning problems, features are considered as input variable that are independent from each other.
Synonyms: attribute, variable, input variable
- Feature vector#
A vector that encodes information about real object, an event or a group of objects or events. Contains at least one feature.
Example: A rectangle can be described by two features: its width and height
- Inference set#
- Interval feature#
A continuous feature with values that can be compared, added or subtracted, but cannot be multiplied or divided.
Examples: a time frame scale, a temperature in Celsius or Fahrenheit
Example: the spam-detection problem has a binary label indicating whether the email is spam or not
Example: in linear regression algorithm, the model contains weight values for each input feature and a single bias value
- Nominal feature#
A categorical feature without ordering between values. Only equality operation is defined for nominal features.
Examples: a person’s gender, color of a car
Synonyms: instance, sample
- Ordinal feature#
A categorical feature with defined operations of equality and ordering between values.
Example: student’s grade
Observation which is significantly different from the other observations.
- Ratio feature#
A continuous feature with defined operations of equality, comparison, addition, subtraction, multiplication, and division. Zero value element means the absence of any value.
Example: the height of a tower
Example: predict temperature based on weather conditions
A property of some real object or event which dependency from feature vector need to be defined in supervised learning problem. While a feature is an input in the machine learning problem, the response is one of the outputs can be made by the model on the inference stage.
Synonym: dependent variable
- Supervised learning#
- Training set#
- Unsupervised learning#
A oneDAL concept for an object that provides access to the data of another object in the special data format. It abstracts data access from interface of an object and provides uniform access to the data stored in objects of different types.
- Batch mode#
The computation mode for an algorithm in oneDAL, where all the data needed for computation is available at the start and fits the memory of the device on which the computations are performed.
A oneDAL concept for an object that encapsulates the creation process of another object and enables its iterative creation.
- Contiguous data#
Data that are stored as one contiguous memory block. One of the characteristics of a data format.
- Data format#
Representation of the internal structure of the data.
Examples: data can be stored in array-of-structures or compressed-sparse-row format
- Data layout#
Example: row-major format, where elements are stored row by row
- Data type#
An attribute of data used by a compiler to store and access them. Includes size in bytes, encoding principles, and available operations (in terms of a programming language).
- Flat data#
A method that returns the value of the private member variable.
std::int64_t get_row_count() const;
- Heterogeneous data#
- Homogeneous data#
The object is immutable if it is not possible to change its state after creation.
Information about logical and physical structure of an object. All possible combinations of metadata values present the full set of possible objects of a given type. Metadata do not expose information that is not a part of a type definition, e.g. implementation details.
Example: table object can contain three nominal features with 100 observations (logical part of metadata). This object can store data as sparse csr array and provides direct access to them (physical part)
- Online mode#
The computation mode for an algorithm in oneDAL, where the data needed for computation becomes available in parts over time.
- Reference-counted object#
A copy-constructible and copy-assignable oneDAL object which stores the number of references to the unique implementation. Both copy operations defined for this object are lightweight, which means that each time a new object is created, only the number of references is increased. An implementation is automatically freed when the number of references becomes equal to zero.
A method that accepts the only parameter and assigns its value to the private member variable.
void set_row_count(std::int64_t row_count);
A oneDAL concept for a dataset that contains only numerical data, categorical or continuous. Serves as a transfer of data between user’s application and computations inside oneDAL. Hides details of data format and generalizes access to the data.
A problem of applying a oneDAL algorithm to a dataset.
Common oneAPI terms#
Application Programming Interface
Data Parallel C++ (DPC++) is a high-level language designed for data parallel programming productivity. DPC++ is based on SYCL* from the Khronos* Group to support data parallelism and heterogeneous programming.
OpenCL [OpenCLSpec] refers to CPU that controls the connected GPU executing kernels.
Just in Time Compilation — compilation during execution of a program.
Standard Portable Intermediate Representation - V is a language for intermediate representation of compute kernels.
SYCL(TM) [SYCLSpec] — high-level programming model for OpenCL(TM) that enables code for heterogeneous processors to be written in a “single-source” style using completely standard C++.