How To Create Advanced Algorithm in Dataset

Show all floors · vividime-Club 2024-12-06 16:38:36 Published on *forum*

Create Advanced Algorithm in Dataset

Show all floors

The product supports the creation of an advanced algorithm on the dataset, and after creating the advanced algorithm on the dataset, reports using the dataset can be used. To use the advanced algorithm, you need to configure Rserver with referring to <R Installation and startup>, and <R allocation>.

1. Function declaration

Enter the [Create dataset] page, open the dataset "Coffee China Market Sales Data", enter the metadata page, click the "More" button in the data column, and select [Create advanced algorithm], as shown in the figure below.

Enter a name in the pop-up "advanced algorithm" window, select the advanced algorithm type, select and set the data column or attribute value that the algorithm needs, or customize the R script to return a new R field (output value). The R fields were generated according to the advanced algorithm, the scope is the current dataset, and all dashboards using this dataset can use the R field on this dataset.

The dataset can be rapidly drawn with four advanced algorithm. Supports simple regression, K-means clustering, and HoltWinters time series forecasting, as well as customize.

2.Algorithm description

2.1 Simple regression

Regression analysis is a very widely used analysis model, mainly used to represent the functional relationship between the dependent variable (Y) and the independent variable (X). When the dependent variable and the independent variable are linear, that is, one yuan once linear relationship, the dependent variable (Y) and the independent variable (X) are represented as a straight line. The function of a single linear regression is y = ax+b, where y is the dependent variable, x is the independent variable, and a and b are constants. If the exponent (power) of the independent variable is greater than 1, the relationship between the dependent variable (Y) and the independent variable (X) is non-linear, and the graph is represented as a curve.

[Variable] x, select the independent variable fields from the drop-down list.

[Dependent variable] y, select the field from the drop-down list.

[Polynomial order] Represents the relationship between N power function of variable and dependent variable. The default is 1. If it is 2, it represents the relationship between quadratic function of one yuan.

[Output] [Fitted values] When it is checked, a fit value field is obtained. After obtaining the regression model according to the training data, the given sample value (x1, x2,..., xn) will be predicted, and the prediction result is the fitted value, which is the estimated value of (y1, y2,..., yn).

[Output] [Residuals] When checked, a residual field is obtained. The result is the actual value of y minus the fitted value.

[Output] [Confidence bands] When checked, the range of confidence band is calculated according to Level, the default Level is 95%.

A values of profit and sales. A series of values for profit and sales were collected, with profit as the independent variable and sales as the dependent variable, and the unary regression advanced algorithm was used to find out the mathematical equation of the created model. The fitted values of the sales were calculated according to the mathematical equation. Create an advanced algorithm on the dataset:

The output results are as shown below.

2.2 K-means clustering

K-Means clustering should specify the classification number of clusters N, randomly take N samples as the center of the initial class, calculate the distance between each sample and the class center and classify it. After the division of all samples, the class center is recalculated, and the process is repeated until the class center does not change. K-means clustering was performed in R using the kmeans function.

kmeans (Data, centers=3, nstart=10), where the centers parameter is used to set the number of classifications, and the nstart parameter is used to set the number of random initial centers, that is, the number of times to which the kmeans method is run. When we use the kmeans function, we take 10 by default.

[Cluster dimensions] The sample set of clusters. Select the fields that need to be clustered from the configurable column on the left directly into the cluster dimension box.

[Setting K] The number of classifications. The number of classifications can be entered manually or the maximum K value, and the system calculates the best K value based on the contour coefficient.

[Output] [Cluster labels] The category to which each sample belongs.

[Output] [Principal components] Analyzes the dimension of clustering, taking the two most important components.

The clustering results are shown in fig.

2.3 HoltWinters time series forecasting

HoltWinters time series forecasting by considering the horizontal trend and the seasonal trend, the sampled data in a period of time and other time interval are analyzed to predict the data in the future period of time. That is, to predict the future data based on the known historical data.

[Time column] Select the time field. The time interval is automatically calculated based on the data of the selected time field.

[Data column] Select the data field. When creating a new advanced algorithm on the dataset bound by the dashboard component, or using the fast advanced algorithm of the chart, you need to choose the aggregation function, so that the data column will be grouped by time column, and the time series forecasting will be conducted on the data after this group aggregation.

[Period] The integer multiple of the time interval should be filled in, and the frequency is calculated according to the period and time interval (period / time interval), that is, the number of observations per unit time. Depending on the time interval, the system automatically fills a reasonable value into the period, which can be modified manually.

[Prediction period] In the future forecast time span, you need to fill in the integer multiple of the time interval. After selecting the time column, the system will automatically fill in a reasonable value, which can also be manually modified.

[Trend (beta)] Consider longitudinal trends. The default is checked to indicating a fit by longitudinal trend.

[Seasonal (gamma)] Whether to consider the seasonal trends. If set to not checked (FALSE), non-seasonal model fit. If set to check, the seasonal model fit was performed. The seasonal patterns can be an additive effect (additive) and a multiplicative effect (multiplicative). The addition effect is checked by default to indicate the trend increase by seasonal addition. When the multiplicative effect is checked, it indicates that the seasonal multiplicative trend increases. When fitting the seasonal model, at least two data points in a period, that is, the frequency is greater than or equal to 2, and the time series contains at least 2 periods.

[Output] [Predicted values] When checked, a fit value field is obtained. The result is based on the obtained model, the time span of the future prediction, calculate the predicted value.

[Output] [Confidence bands] When checked, the range of confidence band is calculated according to Level, the default Level is 95%.

If the data is from January 1957 to December 1958, the time interval is one month, with HoltWinters time series forecasting, the selection period is 6 months, and the subsequent forecast span is 12 months.

If trend (beta) is no, the seasonal factor (gamma) is not checked.as illustrated in following figure,

The predicted results are shown in Fig.

2.4 Customize

Through customization, users can customize the advanced algorithm.

[Computational type] In the metadata area of the connected data, the default detail calculation is in the ash state. In the component binding dataset, the default detail calculation is selected. There are detail calculation and aggregation calculation in the drop-down box. The difference between detail calculation and aggregation calculation is that the R field calculated by aggregation is the aggregation field.

[Script] Enter the script content. You can pass the value of the corresponding column in the dataset through col [["xxx"]]; xxx is the name of the column; you can pass the parameter value through param [["xxx"]]; xxx is the parameter name. For custom scripts, R returns the result of the last executed line of code as a return value. vividime the product requires that the return value must be a list object, containing several columns of return value, such as list (out1=a, out2=b), where out1 and out2 is the name of the return value column, and a and b are the value of the corresponding return value column, which can be a constant or vector.

For other R scripts, please refer to the R official website.

Customized examples are as shown below.

How To Create Advanced Algorithm in Dataset

How To Create Advanced Algorithm in Dataset

Exciting comments1