Mulative distribution function of quantity of views in log scale.Sensors 2021, 21,25 ofFigure 4. Percentage of total views separated by 5 classes of AAPK-25 Cancer Variety of views.Figure five. Percentage of total payload separated by 5 classes of quantity of views.In Table four, we see that 616 videos with more than 1000 views correspond to 85 of our dataset’s total number of views. These information corroborate that handful of videos concentrate most of the users’ consideration. Another significant fact is the fact that, by adding the videos between 83 and 1000 views (1875) and these with greater than 1000 views (616), we get that 25 of our dataset is accountable for 93 of the total bytes transmitted. Hence, when forecasting videos with more than 83 views, we anticipate which videos will use greater than 90 of the infrastructure of streaming services. For this reason, when defining the popularity class in our experiments, we’ll make use of the value of the third quartile.Table 4. Variety of videos with corresponding percentage of total views and total payload.Quantity of Views 0 30 203 83000 1000Number of Videos 2500 2564 2434 1875Views 0.10 0.60 2.70 ten.90 85.Payload 0.ten 1.10 5.30 20.20 73.Sensors 2021, 21,26 of6.3. Textual Options To extract textual capabilities, we applied Fernandes et al. [10] as a guide. We attempted to have as a lot of equivalent options as they have as you can. On the other hand, as a result of distinction in information and facts offered by the platforms (they used Mashable [55] when we use Globoplay), we could acquire 35 attributes from 58 options presented in [10]. Combretastatin A-1 Technical Information Amongst them, we collected the number of words in the title, and in the description, we collected the number of words, the rate of one of a kind words, the price of words which might be not stopwords, along with the number of named entities. Additionally to these, we collected the 5 most relevant topics collected from the descriptions, using the LDA [31] algorithm. The characteristics connected to the subjects will be the proximity of them to each and every video description. All of these attributes are extracted with Scikit-learn [90], Spacy [91], and NLTK [92] libraries. Part in the features is associated to subjectivity and sentiment polarity. Fernandes et al. [10] use the Pattern software program to gather them. As this application will not assistance the Portuguese language, we use the Microsoft Azure cognitive solutions API [93] to fetch the Sentimentbased features. The polarity connected with a text sample might be `positive’, `neutral’, `negative’; for the use of ML algorithms, we produced the following conversion 1 for the constructive polarity, -1 for damaging polarity, and 0 for neutral. Likewise, the value of negative subjectivity is a real number that we multiplied by -1 ahead of working with the classifiers. Applying the publication date, it was also achievable to obtain the day with the week when the video was published. We include two Boolean features to inform if the day is usually a Saturday or even a Sunday. Table 5 exhibits the set with all the 35 textual attributes.Table 5. Textual features collected from the title and also the description of Globoplay.Number 1 2 three 4 5 six 7 eight 9 10 11 12 13 14 15 16 17 18 Function Variety of words of the title Variety of words of the description Price of one of a kind words in the Description Rate of non-stop words in the Description Rate of special non stop words inside the Description Average of word length within the Description Variety of NER inside the Description Subject LDA Closeness to LDA Subject 0 Closeness to LDA Topic 1 Closeness to LDA Subject two Closeness to LDA Subject 3 Closeness to LDA Subject four Weekday is Monday Wee.