Splunking for the homeless
Looking for apartments in San Francisco at craigslist is very time consuming. Since Splunk wants me work and not spend all my time browsing craigslist, I decided to create a hack that alerts me whenever there’s an apartment that I’m interested in.
I did this with a bash script and Splunk.
Preparing data for Splunk
When you search for apartments on craigslist, you get a nice url which contains all your apartment constraints. We want to know when there’s a new apartment ad, so we’ll want to get the most recent apartment ad, and we’ll let Splunk figure out if it’s a new apartment.
The script for doing this:
#!/bin/bash craigslist_search="http://sfbay.craigslist.org/search/apa/sfc?zoomToPosting=&query=&srchType=A&minAsk=2000&maxAsk=3500&bedrooms=2&nh=4&nh=11&nh=10&nh=18&nh=29" curl -s $craigslist_search | \ sgrep -o "%r\n" -i '"<a href=".."</a>"' | \ grep "http://sfbay.craigslist.org/sfc/apa" | \ head -n 1
The script does a craigslist search to your likings, some html/xml grepping with sgrep that gets all the links of the page, greps for apartment links and takes the first one.
Getting the data into Splunk
From here on we’ll call the script “craigslist.sh”.
I placed my script at $SPLUNK_HOME/bin/scripts/craigslist.sh.
I then edited the inputs.conf at $SPLUNK_HOME/etc/system/local/inputs.conf, to make my script a scripted input to Splunk. Everything the script outputs will be indexed by Splunk.
The configuration I added was:
[script://$SPLUNK_HOME/bin/scripts/craigslist.sh] interval = 60 sourcetype = craigslist source = craigslist.com disabled = false
And we now have data in Splunk!
Note: I’ve set the interval to 60, which is the same rate as the splunk saved search will execute.
Detecting when there’s a new apartment
The data in Splunk looks something like this:
5:49:28.000 PM <a href="http://sfbay.craigslist.org/sfc/apa/1234567890.html">Some awesome house that you like</a> host=Petters-MacBook-Pro.local sourcetype=craigslist source=craigslist.com 5:49:03.000 PM <a href="http://sfbay.craigslist.org/sfc/apa/1234567890.html">Some awesome house that you like</a> host=Petters-MacBook-Pro.local sourcetype=craigslist source=craigslist.com
If the most recent and the second most recent event differs, then we have at least one new apartment that we’re interested in!
The search I used to do this was:
craigslist | head 2 | diff pos1=1 pos2=2 | search NOT "Results are the Same"
Save this search as “craigslist” to match with my screenshots below (I apologize for the not so descriptive name for the saved search).
Setting up the alert
In the saved search we just created, we can schedule and enable alerts in a bunch of different ways. I want to get an email so here’s how I did this:
Configuring the mail server (gmail)
And now you want to setup a mail server to send the emails from. I use gmail because it’s available to me. You can find this view by clicking the “Email alert settings” link, under the text box where you enter the email addresses.
And you can leave the left as their defaults if you want.
There you go! Done! Poff! You should get an email every time there’s an apartment you like.
Enjoy your new apartment!