Ensembled Approach to Heterogeneous Data Streams


Lalit Agrawal
Dattatraya Adane


Principal component analysis-based decision tree forest (PDTF) can improve the variety in base classifiers while generating the forest of decision trees. All the trees in the forest have a very low correlation. In this research work, an algorithm is proposed to select the important features from the original data by applying them to the PDTF algorithm and then the selected features are used with long and short-term memory (LSTM) networks for improving the classification accuracy of heterogeneous data streams. This reduces the load on the active classification system and improves the per record classification time. In addition to thirty-five different datasets, Indian National stock exchange data feeds are used for experimentation. This real-time data feed is used as a base for calculating the values of twenty-five technical indicators. Technical indicators statistically forecast the market movement. Since the movement of stock is not only governed by its past values and it simply cannot be predicted with technical indicators alone. Therefore, heterogeneous data related to various domains that could probably impact the performance of the market is also considered. This approach is evaluated against the benchmark methods against a total of thirty-five datasets and livestock feeds and from the results, it is evident that this approach is better than previously used approaches.


How to Cite
Lalit Agrawal, & Dattatraya Adane. (2022). Ensembled Approach to Heterogeneous Data Streams. International Journal of Next-Generation Computing, 13(5). https://doi.org/10.47164/ijngc.v13i5.901


  1. Agrawal, L. and Adane, D. 2021. Improved decision tree model for prediction in equity market using heterogeneous data. IETE Journal of Research, 1–10. DOI: https://doi.org/10.1080/03772063.2021.1982415
  2. Charbuty, B. and Abdulazeez, A. 2021. Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends 2, 01, 20–28. DOI: https://doi.org/10.38094/jastt20165
  3. Dargan, S., Kumar, M., Ayyagari, M. R., and Kumar, G. 2020. A survey of deep learning and its applications: a new paradigm to machine learning. Archives of Computational Methods in Engineering 27, 4, 1071–1092. DOI: https://doi.org/10.1007/s11831-019-09344-w
  4. Mehtab, S. and Sen, J. 2020. Stock price prediction using convolutional neural networks on a multivariate timeseries. arXiv preprint arXiv:2001.09769 . DOI: https://doi.org/10.36227/techrxiv.15088734
  5. Nabi, R. M., Soran Ab M, S., and Harron, H. 2020. A novel approach for stock price prediction using gradient boosting machine with feature engineering (gbm-wfe). Kurdistan Journal of Applied Research 5, 1, 28–48. DOI: https://doi.org/10.24017/science.2020.1.3
  6. Parray, I. R., Khurana, S. S., Kumar, M., and Altalbe, A. A. 2020. Time series data analysis of stock price movement using machine learning techniques. Soft Computing 24, 21, 16509–16517. DOI: https://doi.org/10.1007/s00500-020-04957-x
  7. Rokach, L. 2016. Decision forest: Twenty years of research. Information Fusion 27, 111–125. DOI: https://doi.org/10.1016/j.inffus.2015.06.005
  8. Waqar, M., Dawood, H., Guo, P., Shahnawaz, M. B., and Ghazanfar, M. A. 2017. Prediction of stock market by principal component analysis. In 2017 13th International Conference on Computational Intelligence and Security (CIS). IEEE, 599–602. DOI: https://doi.org/10.1109/CIS.2017.00139