Perl script monitors visitor statistics for YouTube movies
Machine Learning Lite
Line 16 of Listing 5 iterates over all the movies to be monitored and runs a SQL query for each of them to retrieve the historic viewing numbers sorted in ascending order of time of measurement. It dumps these y values into the @yvalues
array and lets the matching x values start at 1 and grow by 1 for each measurement day. In $row[0]
, the first element of each row of the SQL result, you can see the view count for the movie currently processed at the point in time the measurement was taken.
Lines 34 and 35 then remove the latest (today's) measured value so that the script only applies the regression rule against the measuring points that are more than one day in the past. The coefficients()
method in line 42 finally returns the computed $intercept
value (y offset at x=0) and the gradient of the straight line in the $slope
variable. Line 44 uses this to interpolate the value for the last measurement in $y_predicted
; in other words, it determines what you would expect if the average number of viewers continued to increase as it always had.
If the difference to the actual measured value is three times the magnitude of a linear increase, the movie is obviously going viral, and the user is notified by a text message. You need to experiment with this factor; some people might want to learn about even minor successes, so you would use a smaller factor than 3
in line 48 instead.
For more convenience, instead of the print
instruction, you might want to send an HTML email by using, for example, the CPAN Mail::DWIM module, which needs just one line to do that. The recipient would then be able to run their new hit movie directly by clicking on the displayed link.
Infos
- Listings for this article: ftp://ftp.linux-magazin.com/pub/listings/magazine/171
- Lantz, Brett. Machine Learning with R. Packt Publishing, 2013
- Linear regression: http://en.wikipedia.org/wiki/Linear_Regression
« Previous 1 2
Buy this article as PDF
(incl. VAT)