When I took a look at the Instagram account of the NBA, I just shocked. It had numerous followers and posts with a professional post structure. It seems It's time to understand how social media management job came to life.
Only looking at the screenshot above is showing how professionally the NBA Instagram account is managed. They are using all the elements Instagram has. The NBA Instagram account now has approximately 42 million followers. Also, so far they have shared about 32,000 posts.
If you look NBA Instagram account or just looked at the screenshot, you will see how much NBA account is active on Instagram. A question just appeared in my mind after I browsed the account. How often they are sending a post? Is there any schedule or are they send randomly? Is there any time they stop posting? Can a machine make a proper guess about when the NBA going to post? What time is the most frequent post hour? I want to explore these questions first. The answer to the first question is nearly 115 minutes. They send posts in 115 minutes intervals averagely. The time difference between each post's graph is below.
The first point of the above graph is clearly, outliers. Therefore, it turned to the following graph with the methods that are boxplot statistics function.
As a conclusion of the initial graph, there was a time NBA account stop posting around nine days.
> dfDate$nba_by_postcode.PostDate[dfDate$PostDateDıff>15000]
[1] "2012-09-19 13:04:31 UTC"
It is the maximum time difference between the two posts. Following is the first shared post's time after the above post's time.
"2012-09-26 23:10:25 UTC"
As a result of the second graph, there is an unusual space between the slight columns. Seemingly, It's the space where the outliers live. These posts are probably off-season posts. Other than that, there is an upward trend in in-season posts. Additionally, after cleaning the outliers, the in-season mean time difference between posts is approximately equivalent to 51 minutes. The median is nearly 25 minutes. Furthermore, the standard deviation of the time difference between posts was equal to 300 before cleaning outliers. After clean from it, the standard deviation became 60.
For analyzing the hour of the posts, a new column was created named PostHour. After that, the type of PostHour was changed to factor from integer. Then, the frequencies of each hour were counted and were saved the data as a new data frame called hourfreq. The head of the data is below with version of after ordered by frequencies.
It is the maximum time difference between the two posts. Following is the first shared post's time after the above post.
PostHour freq
3 2 3890
4 3 3430
2 1 3349
1 0 2648
5 4 2251
24 23 1835
Making the barplot of the data is concluded with the below graph.
As you can see, I don't have the data of NBA matches but it is obvious that there is a high correlation between match times and post hours.
As a second question for me, when I checked some post I realize that NBA social media handlers love to use mentions. Then a question placed in my mind. Who is the most mentioned by the NBA? How many times were mentioned? What are the statistics of mention uses of the NBA?
Predictably, the winner is @kingjames with 1121 mentions. Frequency summaries are given below.
> head(mention_freq,10)[,c(1,2)]
word freq
kingjames kingjames 1121
stephencurry stephencurry 820
jharden jharden 654
russwest russwest 555
warriors warriors 455
easymoneysniper easymoneysniper 420
natlyphoto natlyphoto 397
giannisan giannisan 377
kyrieirving kyrieirving 373
adbphotoinc adbphotoinc 366
As you can see in the above statistics, If you mentioned by the NBA, probably It'll be the last time. My advice is that just try to enjoy the moment unless you can beat @kingjames records. As you can see in the summary, he is just an outlier in statistics language. Let's see the barplot of the top 10 frequent mentions.
As you can see, there are @warriors after NBA players, as the most frequently mentioned team.
> findAssocs(mention_freq$dtm, terms = "warriors", corlimit = 0.1)
$warriors
cavs houstonrockets okcthunder laclippers pelicansnba stephencurry raptors moneygreen
0.27 0.18 0.14 0.12 0.12 0.11 0.11 0.11
chicagobulls klaythompson
0.10 0.10
The result of the above code shows the correlation with the other word with the given word. Thus, @warriors most associated with @cavs with correlation 0,27. And First team member is @stephencurry who is most associated with @warriors.
Finally, let's try to conclude these results with word cloud which is a trend nowadays.
As the last question, as everybody knows Instagram has 3 post types which are the image, video, and sidecar. Is the count of likes change by post type? Is the post type of like counts change by year? Is it possible the see all these questions answered with one quick graphing?
In the above animation of graph, It's obvious that image type was liked the most regularly. However, in the last years, sidecar type got the first place from the image type. Nevertheless, video type was in the last place always. In the video type, some of the videos were liked much more than the mean of the type. I think this is because some NBA videos are making a take-off but the other videos were liked less.
If you inspect the graph above, you are going to make this conclusion probably; the NBA Instagram account keep always growing since 2012.
As the final, I want to try to make a machine learning model to predict counts of like by the hour of the post. First, by mutating to PostDate and a new column was created named PostHour. I used the code below.
> nba_by_postcode=mutate(nba_by_postcode, date = ymd_hms(PostDate), PostHour = hour(date))
Data has 31963 rows. I split the data by two. The first one is 30000 rows for training data. And the other ones saved as testing data. Later, a new model was created using logistic regression. After checking the model summary, I predicted the testing data's count of like. Then, I compared results with the testing data's real like counts. I made this by the absolute value of deviation. The results provide that we can predict the like count by +-130k with this model.
> mean(abs(pr- testing.like.hour$LikeCount))
[1] 130932.3
Lastly, I aspired to create a new model with more x variables. Therefore, I added to the model post year and post type. I followed the same steps as the last model. Following that, the result provided the predict in +-90k interval for counts of like.
> mean(abs(pr- testing.like$LikeCount))
[1] 91801.24
In conclusion, the NBA account seems managed by a professional team. Also, they are probably using analytic tools for analyzing NBA followers. And they are using the results of these analyses. For instance, it seems they are deciding whether to send the post at that hour or not.
Comments