FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

18
Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ July 2021 | 629 FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S: IN THE CASE OF OROMIA REGION, ETHIOPIA S.NAGARAJAN Faculty, Department of Computer Science, Ambo University, Ethiopia. [email protected] TADU FEYISA Faculty, Department of Computer Science, Ambo University, Ethiopia. [email protected] TADELE DEGEFA Faculty, Department of Computer Science, Ambo University, Ethiopia. [email protected] Abstract The retail and consumer food products industry is hungry to predict the future, yet most predictions are prophetic, not practical. Prediction of consumer food product’s retail price and its associated high volatility is an interesting area of research. There is a need of predicting food prices to save the people in Oromia region, Ethiopia with price inflation. To dataset is collected from Ethiopian Central Statistical Agency (ECSA) from 2009 to 2019. The research study selects the twenty highly needed and available consumer food products with 10 features. To predict retail price of food products four Technical Indicators SMA-3, EMA, MACD, RSI calculated and included as features in dataset. The six features are selected for predicting the food products retail price using principle component analysis (PCA) namely Town, Month, year, items code, Unit price, and SMA-3months and employed with Linear Regression, SVM, and Decision tree machine learning Models. The performance of the machine learning algorithms is measured with RMSE. The RMSE values of linear regression, Decision tree, SVM, are 4.2871, 0.88133, and 4.3686 respectively. Among the three models Decision tree has low RMSE value and achieve the best performance. Keywords: SMA-3Months, EMA, MACD, RSI, Consumer Food Price, ECSA, Prediction. 1. INTRODUCTION Ethiopia is one of the most populated countries in the world with a population of 85 million (World Bank, 2018d). It is also one of the world’s least developed countries; ranked 173 out of 186 in 2012 in the UNDP Human Development Index (UNDP, 2013). Since 1995 the amount of people living under the poverty line is also high. During the international food price inflation, Ethiopia was one of the countries in the world that had the maximum growing in inflation rate. Price variations in food items are directly affect the both consumers and producers. To protect people living under

Transcript of FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Page 1: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 629

FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S:

IN THE CASE OF OROMIA REGION, ETHIOPIA

S.NAGARAJAN Faculty, Department of Computer Science, Ambo University, Ethiopia. [email protected]

TADU FEYISA Faculty, Department of Computer Science, Ambo University, Ethiopia. [email protected]

TADELE DEGEFA Faculty, Department of Computer Science, Ambo University, Ethiopia. [email protected]

Abstract

The retail and consumer food products industry is hungry to predict the future, yet most predictions are prophetic, not practical. Prediction of consumer food product’s retail price and its associated high volatility is an interesting area of research. There is a need of predicting food prices to save the people in Oromia region, Ethiopia with price inflation. To dataset is collected from Ethiopian Central Statistical Agency (ECSA) from 2009 to 2019. The research study selects the twenty highly needed and available consumer food products with 10 features. To predict retail price of food products four Technical Indicators SMA-3, EMA, MACD, RSI calculated and included as features in dataset. The six features are selected for predicting the food products retail price using principle component analysis (PCA) namely Town, Month, year, items code, Unit price, and SMA-3months and employed with Linear Regression, SVM, and Decision tree machine learning Models. The performance of the machine learning algorithms is measured with RMSE. The RMSE values of linear regression, Decision tree, SVM, are 4.2871, 0.88133, and 4.3686 respectively. Among the three models Decision tree has low RMSE value and achieve the best performance.

Keywords: SMA-3Months, EMA, MACD, RSI, Consumer Food Price, ECSA, Prediction.

1. INTRODUCTION

Ethiopia is one of the most populated countries in the world with a population of 85 million (World Bank, 2018d). It is also one of the world’s least developed countries; ranked 173 out of 186 in 2012 in the UNDP Human Development Index (UNDP, 2013). Since 1995 the amount of people living under the poverty line is also high. During the international food price inflation, Ethiopia was one of the countries in the world that had the maximum growing in inflation rate. Price variations in food items are directly affect the both consumers and producers. To protect people living under

Page 2: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 630

the poverty line and producers of food item, predicting the retail food price is important.

Prediction of consumer food product’s retail price in Oromiya and its associated high volatility is an interesting area of research. Retail price prediction using different investigation methods is regularly practiced in current marketing systems by collecting and analyzing different market information [1]. However, market consumer food price prediction is not a common practice in Ethiopia and is often carried out using traditional tools and manual practices making the processing time is more and prone to human errors. Consumer Food prices are still a moving target and with careful planning and consideration, it should be possible to endure the price rises associated with other food items prices and inflations by adjusting our food consumption habits. Several approaches are commonly used to predict the food item price namely fundamental, technical, quantitative analysis and Machine learning. Machine learning is a field in data analytics that focuses on the development of mathematical algorithms to predict prospect measures. Many machine learning systems be present accessible to construct predictive models. Selecting the accurate machine learning algorithm depends on several factors, including, but not limited to: data size, quality, and diversity, speed of trained and memory usage as well as what answers businesses want to derive from that data. Besides, considerations include accuracy, training time, parameters, data points, transparency, and much more. This research work explores the predictability of future Ethiopian consumer food retail price to protect the people living under poverty in oromiya region using machine learning approach.

2. LITERATURE REVIEW

The researchers have been working on consumer food products retail price prediction using the advantage of Machine Learning models. The following is a review of the paper that support as a guide to this work. Machine learning predicts have become popular and even commonplace in financial and commodities markets [3], [13]. Kim uses an approach similar to the KNN trading system method, replacing KNN with a support vector machine for classification. From the Korea composite stock price index, 12 technical indicators are generated to be used as input variables. Using a radial basis function for the SVM's kernel, Kim explores which parameters perform best for the stock data. A comparison between the SVM classifier, a backpropagation Neural Network (BPN), and a KNN is performed. SVM is found to be sensitive to the value of its parameters and SVM was able to outperform the BPN and KNN classifiers in experimental tests when the correct parameters were selected [6].

Similarly, Bowman and Husain in 2004, evaluated the accuracy of different kinds of commodity price predictions based on judgment, historical data, and commodity futures. They determined, in the extended time, spot prices tended to move towards futures prices and exploiting this knowledge produced much better forecasts than the other two forecast types. They concluded based on statistical and directional accuracy, commodity futures yield the best forecasts over long horizons [7]). Also,

Page 3: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 631

quality monitoring in manufacturing is a field where SVMs were successfully applied [8]. Feature selection roles an essential play in the data analysis process by extracting relevant and non-redundant features. It refers to selecting components of the original input variables. Also, supervised ML may benefit from the established data collection in manufacturing for statistical process control purposes and the fact that these data are mostly labeled[9].Applied a multistage approach to the conventional backpropagation algorithm to forecast the price of wheat on the China Zhengzhou Grain Wholesale Market[10]. The results indicate that while the conventional backpropagation algorithm may have struggled against the baseline ARIMA model in the in-sample tests, both the conventional and multistage approaches significantly outperformed the benchmark model with Mapes 36% and 64% lower respectively [3].

In 2008, this conclusion was upheld by Carter and Mohapatra concerning the non-storable commodity, hog futures. They evaluated hog futures prices from 1994 to 2008 and determine via statistical means that the hog futures market was an unbiased predictor of cash prices ([11].

Created a model that combines an Artificial Neural Network (ANN) and Decision Tree to enhance the rate of prediction accuracy in Stock Price forecasting model. Fundamental and technical analyses are used as indicators for the hybrid model which forecasts the stock price in the electron industry in Taiwan[12]. The dataset collected from the TEJ database and 53 Variables ware selected. The results showed that the performance scored by the hybrid model outperforms the performance of individual models.The data used were collected from yahoo finance which covers a time from January 1, 2001, to January 1, 2010. This system uses the K-NN, SVM classifiers for prediction.The results showed the strategy based on the KNN model outperformed the buy-and-hold strategy for 7 of the 10 stocks. K-NN also outperformed the SVM model in prediction.

Prasad and Padhy explore two machine learning algorithms: - Back Propagation and SVM for predicting futures trade prices in the Indian stock market. The prediction model considers opening value, high value, and low value of the market price index as independent variables, and the next day's closing value as the dependent variable [14]. The results of the k-NN classifier were compared with the Logistic regression model and it was observed that the k-NN classifier outperforms the traditional logistic regression method as it classifies the future movement of the BSE-SENSEX and NSE-NIFTY more accurately. The selected features represent the innovative dataset characteristics better and prediction with these features can improve accuracy [15]. Selecting key features of the market to have a greater impact on predicting the market price. To this end, the research work intended to use feature selection techniques. Supervised Machine Learning techniques are learning from past provided by a knowledgeable external supervisor [16]. Empirical studies that examine these effects are lacking, except for a study performed by [17]).

Jeffrey and Bushee tried to develop a market trading model that can successfully trade market securities for a profit, beating buy-and-hold. In the study, 12 technical

Page 4: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 632

indicators, 54 features, and 10 macroeconomic data indicators were constructed for classifying daily stock market data [2]. Many machine learning algorithms can be used to classify a problem given a set of features. Machine learning algorithms have been explored and used aimed at great rate transactions and market microstructure data [18]. Machine learning models were successfully employed for classification, regression, clustering, or dimensionality reduction tasks of large sets of especially high-dimensional input data [19].Machine learning algorithms aim to optimize the performance of a certain task by using examples and/or experience [15]. Generally speaking, machine learning can be divided into three main categories, namely, supervised learning, unsupervised learning, and reinforcement learning.In indication, a contribution of Fruits and Vegetables on Food Security in Ethiopia the average Fruits & Vegetable prices may be low by international standards, but with an average of 10% of the food budget spent on fruits and vegetables, it becomes clear that the average prices of fresh fruit and vegetable produce are relatively high for the average African consumer [20]. Another recent study focuses on backpropagation as a means of optimizing the performance of the neural network. Backpropagation appears throughout the literature as a valuable technique for forecasting time series data [10]. However, as pointed by Mullainathan and Spies applying ML to economics requires finding relevant tasks[21]. Agricultural and food prices have been on the rise since the mid-2000s and have now become parallel with the prices of other commodities [3].

In 2017, Malhotra and Maloo conducted a machine learning approach to forecasting food inflation in India[3]. Their experiments yielded an R-squared value of 99.1% from 25 instances of data [22]. Machine Learning has gained prominence due to the availability of large data sets, especially in microeconomic applications [22]. Identify the most valuable market features for predicting future market prices in Ethiopia. From the selected machine learning models we identify which model better suits the market situation and prediction of Ethiopian market price[23].

Machine learning methods for predicting food prices are less modern in the literature and were the application of this research thesis to spread the effort in this field. used monthly live cattle and wheat prices from 1950 to 1990 to compare an Autoregressive Integrated Moving Average technique to an artificial neural network [24]. Price predicting is forecasting a commodity price by evaluating various factors like its characteristics, demand, seasonal trends, other commodities’ prices offer from numerous suppliers. Price predicting may be a feature of consumer-facing travel apps, such as Train line, used to increase customer loyalty and engagement. At the same time, other businesses may also use information about future prices. Business persons may need to define the best period to buy a commodity to adjust prices of products or services that require a commodity (Teff, Barely, Onions, Tomatoes, Banana and Mango), or evaluate the trader consumer of secure resource

As mentioned the above-related work and since it reviewed to simplify food prices have been on the rise since the mid-2000s and have now become parallel with the prices of other commodities. Among those researches some of them are conducted on stock and commodity market prediction, they are not considering market price prediction of Oromia consumer food price. According to Jay, increased demand for

Page 5: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 633

food caused food commodity prices to skyrocket on the international markets not only this problem that increased market price in the world since there are natural and political conditions that upset the food prices [3].

According to Kim, Prasad and Padhy and Tsai, the Korea composite stock price index and Indian stock market using an approach similar to KNN trading system method, replacing KNN with a support vector machine for classification and Artificial Neural Network (ANN) and decision trees to enhance the rate of prediction accuracy at stock price forecasting model. But market price prediction of Oromia consumer food price using only supervised machine learning to create a regression model. Now a day, a lot of researchers are used to Machine Learning Approach for analyzing price patterns and predicting Consumer Food prices. Most Consumer Food traders nowadays depend on Traditional Trading Systems which help them in predicting prices based on various situations. Also, a wide range of machine learning approaches is available that can be used to design the system. However, because of the uncertainty of the market price prediction of Oromia consumer food prices, no system has a perfect or accurate prediction.In the context of Ethiopia, to our knowledge market data has not been analyzed in an automated manner and no structured market conceptual framework exists[23]. As a result, traders are forced to take a huge business risk and are scared to invest because of business uncertainty. Recently, Ethiopian commodity exchange (ECX) started hosting a commodity market[23].Now a day, a lot of researchers are used to Machine Learning Approach for analyzing price patterns and predicting Consumer Food prices. No system is available for price prediction of Oromia consumer food items with retail price.

3. MATERIALS AND METHOD

The proposed methodology shown in figure 1.

2.1 Dataset

Dataset for the research work is collected from Ethiopia's central statistics Agency. Data were obtained from an ESCA taken 100214 datasets period 2009 to 2020 from those only used 9038 data sets a period of 2018 chosen for training by randomly sampled techniques at ESCA in the Oromia region. The data from Oromia region contains the Region, zone, woreda, town, month, year, item code, source of information, the standard unit of measurement, the unit price of Consumer food categories such as Teff-White, Teff-Mixed, Teff-Red, Wheat white, Wheat black, Barely white, Barely mixed, Barely black, Maize, Sorghum yellow, Sorghum white, Sorghum red, Tomatoes, Onions, Garlic, Banana, Orange, Avocado, Mango, and potato are available. The historical price of the 20 items are available for Consumer food from 2008 up to 2020 is used for the experiment. The dataset used for the experiment contains 14 columns in addition to computed attributes SMA, EMA, MACD, RSI.

Page 6: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 634

2.2 Sampling Technique

For this research work, the data sets for the document review was selected through simple random sampling to have more precise information inside the ECSA about the features and type of food categories that was studied. The choice of this sampling technique is based on the fact that the method has an advantage in that it assures that each data sets element. Data were obtained from an ESCA taken 100214 datasets period 2009 to 2020 from those only used 9038 data sets a period of 2018 chosen for training by randomly sampled techniques at ESCA in the Oromia region.

Figure 1:-Proposed Methodology

2.3. Preprocessing

Data preprocessing techniques are a collection of techniques applied over input datasets to eliminate noisily, missing, and inconsistent data and thereby enhancing the efficiency of the data mining process. Some of the preprocessing techniques are

Data Collection

Feature Selection

Data Preprocessing

Feature Extraction Data set

Test Dataset Training Dataset

Prediction Results

Machine Learning Models (Decision Tree, SVM,KNN)

Page 7: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 635

data cleaning, integration, transformation, and reduction. The removal process for data would go through a series of steps that consist of transforming the already summarized data found in the dataset into information, which would then produce useful results. Dataset is received from ECSA from 2008 to 2020 for the region with ten fields with Ethiopia calendar and so many items. The dataset contains all-region in Ethiopia. Only Oromia region data extracted from the given datasets. The Ethiopian calendar dataset is changed to the European calendar. In the next step, only 20 items selected using the filter method and shown in table 1. For predicting SMA, EMA, RSI, MACD fields are added and found value by using the formula.

Simple Moving Average

Simple Moving Average (SMA) is simply the average price over the last N number of months available. Let’s calculate SMA for the unit prices from our sample dataset. The research work has been calculating a 3month moving average based on the unit price.

SMA = ( )

Exponential Moving Average

Exponential Moving Average (EMA) is the type of moving average that is similar to a simple moving average, except that more weight is given to the latest data. The exponential moving average is also known as “exponentially weighted moving average”(Mar 22, 2020).

In Equation 2, Pn is the final price month (EMA calculation month) and T is the period. It should be noted that to start calculating this indicator, the first EMA is required, which is the first EMA of the moving average T of the previous period, which in this study, T is considered from the beginning to the end of the price data.

In Equation 2, Pn is the final price month (EMA calculation month) and T is the time. It should be noted that to start calculating this indicator, the first EMA is required, which is the first EMA of the moving average T of the previous period, which in this study, T is considered from the beginning to the end of the price data.Moving Average Convergence Divergence indicator is a trend following indicator, which tracks the difference between two exponential moving averages, this difference is usually referred to as the “MACD line”, and another shorter time-frame exponential moving average of the MACD line, called the “signal line”. This means we will have four data-series when we calculate MACD (May 19, 2014): 1) fast EMA, short EMA, the MACD line, and the Signal line. The basis for the calculation will be the unit price.

MACD Fast – the time for the “fast” EMA used in MACD line calculation (May 19, 2014).

Page 8: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 636

MACD Slow – the period for the “slow” EMA used in MACD line calculation (May 19, 2014).

Relative Strength Index (RSI)

The Relative Strength Index (RSI) is a momentum oscillator that calculates the velocity and strength of a financial instrument price movement (May 19, 2014). It was developed by J. Welles Wilder 1970s and to this day remains one of the most popular indicators in technical analysis. RSI ranges between 0 and 100 and this fact makes it a convenient indicator to evaluate whether the market is currently overbought or oversold. While each market is different and overbought/oversold levels are somewhat subjective – an RSI value about 70-80 value is generically considered to indicate that market is overbought and value below 20-30 level is generally considered that market is oversold. RSI is typically calculated for a range of 14 price bars, which is the default value in most trading platforms (May 19, 2014). However, any positive integer value can be used, with higher values generally considered to be providing stronger but slower signals. RSI consists of a single time series. The only parameter to RSI is the number of periods (months, years, unit price, etc) to calculate average gain & loss values (May 19, 2014).

2.4. Feature Extraction

Feature extraction increases the accuracy of learned models by extracting features from the input data [27]. Feature extraction involves reducing the number of resources required to describe a large set of data (September 2020).The extracted features shown in the Table 6.The extracted features are region,zone,worda,town,month,year,itemcode,sourceofinformation,the standard unit of measurement,unit price,SMA-3months.

3.5. Features Selection

Feature selection reduces the dimensionality of data by selecting only a subset of measured features (predictor variables) to create a model [4]. Feature selection algorithms search for a subset of predictors that optimally models measured responses, subject to constraints such as required or excluded features, and the size of the subset. The main benefits of feature selection are to improve prediction performance, provide faster and more cost-effective predictors, and provide a better

Page 9: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 637

understanding of the data generation process [4].From a given dataset of data features selection from the Ethiopia Central Statistics Agency, the research work computed data analysis that latter can be taken as features for consumer food price prediction. The research work selected six features namely; Town, Month, year, items code, Unit price, and SMA-3Months.

3.6. Machine learning Algorithms

Many machine learning methods can be used to a regression a problem given a set of features. The machine learning methods selected for this thesis would be regression models to use compute all features and labels of the training dataset and when new data is given to it, it has to assign labels to the new observations depending on what it has learned from the training dataset. Each is selected based on their advantages and past performance seen in other research.Machine learning algorithms have been explored and used for high-frequency trading and market microstructure data[28]. This work looks to those algorithms to see if any are particularly useful in regression Oromia Consumer Food market price data and give price prediction given a set of inputs generated through technical analysis indicators of the market price. Algorithms investigated are Support Vector Regression (SVR)[30], linear decision tree, and Linear regression (see Chapter 4). Below the research has briefly described the selected algorithms. Many machine learning techniques have been explored for stock price direction prediction [28]. ANN and Support Vector Regression (SVR) are two widely used machine learning algorithms for predicting stock price and stock market index values [28]. A supervised learning algorithm learns from labeled training data, helps you to predict outcomes for unforeseen data (Jul 22, 2020). Supervised learning techniques like Support Vector Regression, linear regression, and linear decision Trees can learn to predict consumer Food market prices and trends based on historical data and provide meaningful analysis of historical price.

4.2. Descriptive performance analysis

The descriptive performance analysis of the research work is carried out in the following manner

Preprocessed dataset for the year 2018 given as input in Mat lab using import data in mat lab with all six features.

Next, Select regression learner app and star new session with workspace and specification of predictors and response variables. The research selects the SMA-3 months as a response variable and remaining features as predictors with 10 fold cross-validation.

In regression learner app, features Town, Month, Year, Item code, Unit price, and SMA-3months selected using feature selection option.

The training process is done with linear Fine tree, linear regression, and linear SVM one by one, and results are shown in three ways namely response plot, Predicted vs Actual plot, residual plot, and shown in Figures 5 to 13.

The performance of the three machine learning model is shown in Figure 14 after completing the training process. Among linear tree, linear regression, and

Page 10: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 638

linear SVM the linear fine tree algorithm trained the dataset with the least RMSE value 0.8.

Next training regression models in Regression applications, the researcher work was identified the best model with model comparative based on model statistics, visualize results in response plot, or by plotting actual versus predicted response, and evaluate models using the residual plot as mentioned bellow.

Fig-2 Linear Fine Tree Predicted vs Actual plot

Fig-3 Linear Regression Actual vs Predicted Plot

Fig-4 Linear SVM Actual vs Predicted plot

Fig-5 Linear Fine Tree Residual plot

Fig-6 Linear Regression Residual plot

Fig-7 LinearSVM Residual plot

Fig-8 Linear Fine Tree Response plot

Fig-9 Linear Regression Response plot

Fig-10 Linear SVM Response plot

Page 11: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 639

As it can be understood from figure 5 the data appears to normally distributed, as it follows diagonal line closely with observation data and prediction data value. Usually a good model has the plot the predicted response of your model is plotted against the actual, true response. A perfect regression model has a predicted response equal to the true response, so most of them the points lie on a diagonal line. The vertical distance from the line to any point is the error of the prediction for that point. So this model good predicted vs actual response when compare from other model the researcher used.

As it can be realized from figure 6 the data appears to normally distributed, as it follows diagonal line closely with observation data and prediction data value. Usually a good model has residuals scattered roughly symmetrically around 0. If you can see any clear patterns in the residuals, it is likely that you can improve your model. So this model good residuals response and true response from other model. A selected model has small errors, and so the predictions are distributed near the line. The residuals plot displays the difference between the predicted and true responses. Choose the features to plot on the x-axis under X-axis. Choose either the true response, predicted response, record number, or one of your predictors so based the research work model selection parameter was RMSE least indicated by linear fine tree. So this model indicated was best performance for the residuals plot results to compare other model.

As it can be indicated from figure 7 plot, the predicted response of your model is plotted against the actual, true response. An exact regression model has a predicted response equal to the true response, so all the points’ untruth on a diagonal line. The vertical distance from the line to any point is the error of the prediction for that fact. A chosen model has small errors, and so the predictions are dispersed near the line. So this model residual less than other model when compare the researcher used for consumer food price prediction this model.As it can be seen from figure 8 the data appears to normally distributed, as it follows diagonal line closely with observation data and prediction data value. Usually a good model has the plot, the predicted response of your model is plotted against the actual, true response. A perfect regression model has a predicted response equal to the true response, so some of them the points lie on a diagonal line. The vertical distance from the line to any point is the error of the prediction for that point. So this model good predicted vs actual response when compare from SVM model the researcher used.

As it can be seen from figure 9 the data appears to normally distributed, as it follows diagonal line closely with observation data and prediction data value. Usually a good model has residuals scattered roughly symmetrically around 0. If you can see any clear patterns in the residuals, it is likely that you can improve your model. So this model good residuals response and true response from SVM model. A selected model has minor errors, and so the predictions are dispersed near the line. The residuals plot displays the difference between the predicted and true responses. Choose the features to plot on the x-axis under X-axis. Choose either the true response, predicted response, record number, or one of your predictors so based the research work model selection parameter was RMSE least indicated by linear fine

Page 12: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 640

tree. So this model indicated was best performance for the residuals plot results to compare SVM model.

As it can be indicated from figure 10 plot, the predicted response of your model is plotted against the actual, true response. A perfect regression model has a predicted response equal to the true response, so some the points lie on a diagonal line. The vertical distance from the line to any point is the error of the prediction for that point. A selected model has small errors, and so the predictions are scattered near the line. So this model Response less than SVM model when compare the researcher used.

As it can be seen from figure 11 the data appears to normally distributed, as it follows diagonal line closely with observation data and prediction data value. Usually a good model has the plot, the predicted response of your model is plotted against the actual, true response. A perfect regression model has a predicted response equal to the true response, so some of them the points lie on a diagonal line. The vertical distance from the line to any point is the error of the prediction for that point. So this model good predicted vs actual response when compare from SVM model the researcher used.As it can be seen from figure 12 the data appears to normally distributed, as it follows diagonal line closely with observation data and prediction data value. Usually a good model has residuals scattered roughly symmetrically around 0. If you can see any clear patterns in the residuals, it is likely that you can improve your model. So this model good residuals response and true response from SVM model. A selected model has minor errors, and so the predictions are scattered near the line. The residuals plot displays the difference between the predicted and true responses. Choose the features to plot on the x-axis under X-axis. Choose either the true response, predicted response, record number, or one of your predictors so based the research work model selection parameter was high RMSE indicated occurred under this model. So this model indicated was less performance for the residuals plot results to compare the above model.

As it can be indicated from figure 13 plot, the predicted response of your model is plotted against the actual, true response. A perfect regression model has a predicted response equal to the true response, so some the points lie on a diagonal line. The vertical distance from the line to any point is the error of the prediction for that point. A selected model has small errors, and so the predictions are distributed near the line value. So this model Response least than two model when compare the researcher used. Figure 14 indicated Based on visualize results in response plot, or by plotting actual versus predicted response, and evaluate models using the residual plot as mentioned above Comparisons Results of Linear Fine Tree, Linear SVM, and Linear Regression the perfect Regression model has Predict Response equal to True Response, good model has small errors, Residual plot equal to predict actual, good model has residual scattered roughly symmetrically around zero. According to the above visualization results of machine learning model the linear Fine tree has good performance than others model. So this model was selected to price prediction consumer food price.

Page 13: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 641

4.3 Analysis of Models with Performance Metrics

To see the applicability and performance of the above Machine learning techniques different metrics would have been used. The effectiveness of regression algorithms may depend on many factors like the quality of information the attributes provide, the class distribution of the dataset, and the number of instances. Such factors were addressed in the feature selection stage and have less impact on the performance of the machine learning techniques. The following performance metrics are provided by Mat lab and were used for measuring the performance of machine learning techniques.

Mean absolute error: measure the average magnitude of the errors in a set of prediction, without considering the direction. It expresses the average model prediction error in units of the features of needs[34].Root Mean Squared Error: is a quadratic scoring rule that also measures the average magnitude of the error. It is the square root of the average of squared differences between predicted and actual value.

Relative Absolute error: it is relative to a simple predictor, which is just the average of actual values. It takes the total absolute error and normalizes it by dividing by the total absolute error of the simple predictor[35]. In this experiment, the researcher work has used linear regression, linear SVR, and linear Decision tree Algorithm, and the research work achieved is to predict the Consumer food items on the market price prediction. Once you have built your regression model, the most important question that arises is how good is your model? So, evaluating your model is the most important task in the data science study which delineates how good your predictions are.Which machine learning algorithms give a better prediction? In this research question, the research work investigated if every machine learning algorithms has equal performance in predicting the future market price. Using the results from feature selection the research work check for the performance of some selected prediction algorithm. The results were conducted using Mat Lab R2019a. The three machine learning models (linear Support Vector Machine, linear Fine tree, and linear regression) were used to predict the data on the three selected Consumer food items.

Page 14: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 642

The evaluation performance matrix of develops regression model using the following formula.

The root means square error (RMSE) is the square root of the MSE.

√∑ ( )

Mean square error (MSE) is probably the most commonly used error graphical

status of representation measurements. It penalizes larger errors because squaring larger numbers has a greater impact than squaring smaller numbers. The MSE is the sum of the squared errors divided by the number of observations.

∑ ( )

The mean absolute error /deviation (MAD) is the sum of absolute differences between the actual value and a predicts divided by the number of observations.

∑ | |

Page 15: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 643

Table 1: Comparison of the machine learning algorithm

Machine Learning Models

RMS MSE MAE R-squared Prediction speed

Training time

Linear Decision Tree

0.88133 0.7767 0.2555 0.99 34000obs/s 77.14 sec

Linear Regression 4.2871 18.379 2.7732 0.76 49000 0bs/s 28.035 sec

Linear SVM 4.3686 19.085 2.6993 0.75 23000 obs/sec

125.64 sec

From Table 8 the research work computed the performance for the three models across the twenty consumer food items and found the RMSE value to be 4.3686 for Linear SVM, 4.2871 for linear regression, and 0.8813 for linear fine tree. The results for the linear fine tree prediction recorded a good performance for all consumer food items. The best performance was achieved by a Fine tree, while the linear SVM model had the worst performance on the consumer food items market data. Fine tree and linear regression show comparable performance. The model's Fine decision tree, linear regression, and linear SVM found an MAE error 0.77, 18.37, 19.085, and MSE error 0.255, 2.77, and 2.699 respectively. Based on training time the linear regression performed very well other than Linear Fine tree and Linear SVM.

4.4 Analysis of Predicted Values

Table 9 describes the consolidated predicted values of the linear fine tree model from 2021 to 2023. The items Teff, Wheat, Barley, Maize, Sorghum, Tomatoes, Onions, Garlic, Banana, Orange, Avocado, Mango, and Potatoes are taken for analysis and the average value of the price from January to December calculated for every year with item wise. The item teff gradually increased values from 21.01 to 21.56 from the year 2021 to 2022 and finally end up with 21.06 in the year 2023. The item teff price value is almost nearing to actual values. The item wheat for the years 2021, 2022 is from 11.99 to 12.57 and for the year 2023, the model predicted price decreased values 12.34. The item Barely for the years 2021, 2022, 2023 is from 13.39 to 13.63 and for the previous year, the model predicted price is gradually decreased. The item Maize for the years 2021, 2022, 2023 is from 9.39 to 9.75 and for the previous year, the model predicted price is less price. The item Sorghum for the years, 2021, 2022, 2023 are from 10.77 to 10.29 and for the year the model predicted price is progressively decreased. The item Tomatoes predicted price is from 11.95 to 17.37 using a linear fine tree Model. The item onion predicted price is from 15.46 to 16.10 using the linear fine tree Model. The item garlic predicted price is from 20.50 to 24.29 for 2021 to 2023 and gradually decreased from the previous year using a trained Model. The item Potatoes predicted price is from 9.61 to 11.78 for years from 2021 to 2023. The fruit items Banana, Orange, Avocado, and Mango predicted price various 11.23 to 35.45. The above analysis is shown in Figure 15.

Page 16: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 644

This study is focused on predicting the retail food item price using machine learning. The research raised two research questions and performs the research activity. The first research question was examining the features of current Oromia consumer food features to find out the most valuable features for predicting the consumer food price. Features for the study are derived from the collected data. The researcher computed different data analysis. The computed data analysis was a feature and evaluated for features of individual predictive ability. From the feature selection of consumer food items market prices have found that features like item code, Town, Month, Year, unit price, SMA-3months founded in the most significant features of individual performance evaluations. Moreover features namely Items code, Town, Month, Year, unit price, and SMA-3months founded less redundant from the given dataset. These results are categorized into two groups and used as an input for the machine learning algorithms. The second research question was the comparison of machine learning models that better predict the consumer food price. The goal of feature selection was used to compare the models of machine learning approaches. Two experiments conducted the first was a comparison of the models with 10 fold cross-validation using a feature of individual predictive ability and less redundancy. The main benefits of feature selection are to improve prediction performance, provide faster and more cost-effective predictors, and provide a better understanding of the data generation process, further illustrates the success of the Financial Future Model. The second one was a comparison of models with separate train and test data using a feature of individual predictive ability and less redundancy. From the models (linear regression, linear SVM, and linear Fine tree machine learning algorithm) the performance of Linear Fine tree and linear regression algorithms were showed superior on linear SVM. The performance of the algorithms is measure with RMSE. The RMSE value of linear regression, linear SVM, Linear Fine tree are 4.2, 4.3, 0.8 respectively. Among the three model linear Fine trees has a low error rate and achieve the best performance.

0

5

10

15

20

25

30

35

40 2021 2022 2023

Page 17: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 645

5. CONCLUSION

The research raised two research questions and performs the research activity. The first research question was examining the features of current Oromia consumer food features to find out the most valuable features for predicting the consumer food price. Features for the study are derived from the collected data. The researcher computed different data analysis. The computed data analysis was a feature and evaluated for features of individual predictive ability. From the feature selection of consumer food items market prices have found that features like item code, Town, Month, Year, unit price, SMA-3months founded in the most significant features of individual performance evaluations. Moreover features namely Items code, Town, Month, Year, unit price, and SMA-3months founded less redundant from the given dataset. These results are categorized into two groups and used as an input for the machine learning algorithms. The second research question was the comparison of machine learning models that better predict the consumer food price. The goal of feature selection was used to compare the models of machine learning approaches. Two experiments conducted the first was a comparison of the models with 10 fold cross-validation using a feature of individual predictive ability and less redundancy. The main benefits of feature selection are to improve prediction performance, provide faster and more cost-effective predictors, and provide a better understanding of the data generation process, further illustrates the success of the Financial Future Model. The second one was a comparison of models with separate train and test data using a feature of individual predictive ability and less redundancy. From the models (linear regression, linear SVM, and linear Fine tree machine learning algorithm) the performance of Linear Fine tree and linear regression algorithms were showed superior on linear SVM. The performance of the algorithms is measure with RMSE. The RMSE value of linear regression, linear SVM, Linear Fine tree are 4.2, 4.3, 0.8 respectively. Among the three model linear Fine trees has a low error rate and achieve the best performance.

REFERENCE

[1] Selam Damtew March 2018., “A Data Analysis and Market Price Prediction of Ethiopian Commodity Market with Machine Learning Algorithms.”

[2]Caley, Jeffrey Allan. and 2013., “A Survey of Systems for Predicting Stock Market Movements, Combining Market Indicators and Machine Learning Classifiers.”

[3]Jabez (Jay) Harris(2017), “A Machine Learning Approach To Forecasting Consumer Food Prices In Canada.”

[4]Guyon, Isabelle, and A. Elisseeff ,2003, “An introduction to variable and feature selection.”

[5]Subha, M. V., and S. ThirupparkadalNambi.(2012), “‘Classification of Stock Index movement using k-Nearest Neighbours (k-NN) algorithm.’WSEAS transactions information science and application,” no. 9.

Page 18: FORECASTING RETAIL PRICE OF CONSUMER FOOD PRODUCT’S

Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/ Journal of Tianjin University Science and Technology ISSN (Online): 0493-2137 E-Publication: Online Open Access Vol:54 Issue:07:2021 DOI 10.17605/OSF.IO/32DYJ

July 2021 | 646

[6]Kim, Kyoung-Jae. (2003), “Financial time series forecasting using support vector machines.”

[7]Chakriya Bowman and Aasin M.Husain,2004, “Forecasting Commodity Prices: Future versus judgment.”

[8]Ribeiro, B., 2005., “Support vector machines for quality monitoring in a plastic injection molding process.”

[9]J. A. Harding, M. Shahbaz, Srinivas, A. Kusiak, 2006, “Data Mining in Manufacturing: A Review.”

[10]Z. H. 40. et al. (2007), “introduced a MAOA with neural network for forecasting the price of food grains at China.”

[11]Aug 2006 and Colin A. Carter, “How Reliable are Hog Futures as Forecasts?”

[12]Tsai, C. F., and S. P. Wang and 2009., “Stock price forecasting by hybrid machine learning techniques.”

[13]Andres M. Ticlavilca, Dillon M. Feuz, and Mac McKee April 19-20,2010, “Forecasting Agricultural Commodity Prices Using Multivariate Bayesian Machine Learning Regression.”

[14]Haider Khan, Zabir, TasnimSharminAlin, and AkterHussain.(2011), “‘Price Prediction of Share Market Using Artificial Neural Network’ANN’.’ International Journal of Computer Applications.”

[15]Adebiyi, A. A., et al.Adebiyi, A. A., et al., 2012, Adebiyi, A. A., et al., and 2012, “Stock price prediction using a neural network with hybridized market indicators.”

[16]utton & E. James Kehoe, 2012, “Evaluating the TD model of classical conditioning.”

[17]Martine M Rutten, 2013, “What economic theory tells us about the impacts of reducing food losses and/or waste: implications for research, policy, and practice.”

[18]GORDON, L. (2013), “Using Classification and Regression Trees in SAS Enterprise Miner For Applications in Public Health.”

[19]Alpaydin, E and 2014, Introduction to Machine Learning.

[20]Deribe, Habtamu, and PY - 2019, “Review on Contribution of fruits and vegetables in security in Ethiopia.”

[21]Mullainathan, S., and Spiess, J. 2017, “Machine learning: An applied econometric approach. Journal of Economic Perspectives.”

[22]Athey, S. (2018)., “The impact of machine learning on economics.”

[23]Amanuel Getachew Bulti and Abhishek Ray, June 2019, “Commodity Market Price Analysis and Prediction using Machine Learning Framework.”

[24]Hall, Mark A., and Lloyd A. Smith.1999., “Feature selection for machine learning: comparing a correlation-based filter approach to the wrapper.”