Skip to main content

Analysis of Online News Popularity Data in Machine Learning Repository using R Studio

John Michael D. Aquino
Laguna State Polytechnic University
Sta. Cruz, Laguna, Philippines





Media are tools to get information across the globe. And online news is described as an article that is made for spreading awareness on all kinds of topics or issues published on the internet. In this research, the researcher wanted to analyze the online news popularity data in a machine learning repository using the R Studio in terms of the relationship of variables to identify the popularity of online news articles. Specifically, the researcher desired to identify the results in Principal Component Analysis using the Online News Popularity data in Machine Learning Repository; show the results in the Path Analysis using the online news popularity mined data; reveal the relationship of the variables (publication article, words, and links) towards the popularity of online news; display the structural equation model derived from the mined data; and determine the relationship between confirmatory factor analysis and the structural equation model. The data consisted of a heterogeneous set of features which was extracted into 61 characteristics (58 predictive features, two non-predictive, and one goal field) and from the overall of 39,644 respondents. The mined data set came from UCI Machine Learning Repository in the CSV file. This study employed a quantitative method. RStudio is software used which was a computational research tool for creating dynamic and reproducible research. R software is installed in the computer and used to treat and analyze the data in all statistical requirements of the study.

Keywords: Principal Component Analysis, Structural Equation Model, Path Analysis, Confirmatory Factor Analysis, Online News Popularity

Read More>>