I’m rewriting Listify in Haskell, with a few changes. As a reminder, the original Listify (henceforth Listify-js) is a web app that scrapes the BBC radio website for tracklisting and program schedule information, and allows users to browse these tracklists and create Spotify playlists from them. The new version (henceforth Listify-hs) aims to build on this by introducing VCR-like functionality (or TiVo Series Link for the young’uns among you) of being able to ‘follow’ a show and have each episode’s corresponding playlist delivered to your Spotify account, as scheduled.
This is the first in a short series of blog posts that I’ll write as I build the system. This first post covers the scraping component, but keep an eye out for future posts, and also follow the implementation here on GitHub.
The purpose of this part of the system is to scrape playlist information from the BBC website (unfortunately there is no API available), and persist it to a database for later use. This is a batch process that will run daily. The components of interest are highlighted in red in the diagram below:
The main file of entry to follow the code for the scraping component is here.
The actual web scraping is done using the Scalpel library. Scalpel allows you to declaratively describe the scraping to be performed, resulting in more elegant code when compared to the original Listify-js implementation.
The central type in the Scalpel library is Scraper. The Scraper‘s job is to look certain HTML tags in a webpage that match some criteria, and return the contents therein. Scrapers are made more useful by being an instance of (among others) two type classes: Applicative and Alternative.
Alternative allows us to represent a choice of computation. We can use the <|> operator between two possible Scrapers that have different matching criteria, to have both criteria evaluated and return the combined result. This is useful when parsing HTML where there is some alternative or missing tags. For example, from here (simplified below), we may not always have an album tag, so we should check both ways (with and without):
playlistItemScraper :: Scraper ... Track playlistItemScraper = hasAlbumScraper <|> noAlbumScraper
Applicative allows us to use multi-argument functions in a more general way, i.e. by applying them to values in a context. First, some background: the Applicative type class has less structure than Monad, but more than Functor. The fmap function in Functor allows us to apply a single-argument function to a value in a Functor context (by lifting that function into a Functor). For example, below we have the single-argument function (+3), a section, and a value in the Maybe Functor (Just 3):
example :: Maybe Int example = fmap (1+) (Just 3) -- results in 'Just 4' -- 'fmap (+) (Just 3)' is not possible
However, its more useful to be able to lift multiple-argument functions into a context. Monad supports this with the various liftMn functions (where n is the number of arguments the function takes). In the example below, we lift the two-argument function (+), using the liftM2 function, to the Maybe Monad:
example :: Maybe Int example = liftM2 (+) (Just 1) (Just 3) -- results in 'Just 4'
However, Monad brings excess functionality than what is required to solve just this problem of function application. Thats where Applicative comes in. By using the <$> and <*> operators from Applicative, we can lift multi-argument functions to an Applicative and then apply it to values in that Applicative context, respectively. Example below with Maybe Applicative:
example :: Maybe Int example = (+) <$> (Just 1) <*> (Just 3) -- results in 'Just 4'
The scraping code makes use of this. For example we can write an expression like the one below (taken from here), which in my opinion is quite elegant (type signature simplified):
playlistItemScraper' :: Scraper ... Track playlistItemScraper' = Track <$> artistScraper <*> titleScraper <*> albumScraper
Track here is a multi-argument data constructor (function) whose type is: Text -> Text -> Text -> Track. The values artistScraper, titleScraper, albumScraper are of type: Scraper … Text (think of this as a Text value in the ‘Scraper’ Applicative context). The Track function gets lifted to the Scraper Applicative and applied to the artistScraper, titleScraper, albumScraper values.
playlistItemScraper' :: Scraper Text Track playlistItemScraper' = do artist <- artistScraper titl <- titleScraper album <- albumScraper return $ Track titl artist album
getStationUrlsAndNames :: Date -> [Station] -> ([URL], [StationName]) getStationUrlsAndNames date stations = ... map (f &&& g) stations
&&& acts as a fan-out function that sends the input to both functions (f and g above) and combines their output. This combined function is then mapped over a list of stations above. Though a similar result can be achieved by mapping a binary function over the list, or using a fold with a tuple accumulator, I think this is a neat alternative. I will however need to devote some time to deep-dive and fully understand Arrows.
The results of the scraping are persisted to a MySQL database, for later use by other components of the Listify application (coming soon). Interaction with the database is done via the mysql-simple library, created by the guy who literally wrote the book on how to use Haskell in the real world.
The type representing a SQL query is Query. My only (minor) frustration is that there is only really one way to create a Query (from a literal ByteString); so one cannot, for example, read the query in from a separately maintained .sql file and use that.
That’s it for Part I – look at for Part II coming soon!
And as ever, if you spot anything incorrect, feedback is welcome