Task
The impetus for learning ApplesSript was a simple but mind numbing assignment. I had pull down around images and data for almost 1000 items from a system while maintaining a relationship between the data and the image files. The system was behind password protection so wget alone wouldn't suffice. Also, to download the images the click event had to be triggered on the page (thank you ASP.NET). And the company who built and maintains the system was not being helpful.
Hello AppleScript!
I was left with no alternative but to visit every page and click all the links, LAME! Thankfully AppleScript is pretty powerful. Every Apple user has to try this out at least once. AppleScript can open applications and perform different operations, then pass the results to another application. Why do something when I can tell my computer to do it for me?
The script itself ended up being pretty simple:
- grab jQuery
- Open browser
- grab list of links
- visit each link
- inject jQuery
- get data from this link
- download image
- go to next link
Opening up applications is trivial with AppleScript. The tricky parts are pulling down the list of links and then visiting them in order.
Gimme jQuery
Both Chrome and Safari allow JavaScript to be executed through AppleScript. To make everything easier jQuery was injected on page load. First jQuery has to be loaded by the AppleScript.
set jqueryFile to ("/Path/to/jQuery/file/jquery.js") open for access jqueryFile set jqueryContents to (read jqueryFile) close access jqueryFile
Grabbing links
The next step is to open up Safari, tell it to open into a new window, go to an address and inject jquery. Unfortunately, Safari doesn't provide an easy way to determine a page is loading. Setting a delay is a pretty quick if unreliable solution.
tell application "Safari" activate --make new document and wait for new page to load delay 1 tell front document to set URL to "http://sickawesome.com" delay 15 set doc to front document tell doc do JavaScript jqueryContents -- the do JavaScript command returns javascript arrays as a list set image_hrefs to (do JavaScript "var hrefs = []; $('#list of links').each(function(){hrefs.push($(this).attr('href'))}); hrefs;") end tell end tell
The downside of scripting Safari is already apparent. The loading status of a current document isn't directly available to AppleScript. Some other method has to be found to delay until the document is ready to run JavaScript. However, Safari was used for this step, because of how well it handles JavaScript. Safari, unlike Chrome, returns the value of a JavaScript statement, so that AppleScript can use it later. When Safari returns a JavaScript array AppleScript handles it as a list, no conversion to do. Awesome.
Quick Visits Only
The next step is the longest of the whole process. Visiting each page in succession, pulling down info and then going to the next. Chrome was chosen for two reasons: Chome is fastest browser out there, and Chrome tabs provide access to the loading status of the page. This is important, because after a couple dozen pages the connection speed fell dramatically, rendering delays ineffective.
However, as I mentioned before, Chrome does not return values from executed JavaScript. So a little more creativity is required. Fortunately both JavaScript and AppleScript have access to the title of a tab. So as long as the value can be cast as a string (unsure about arrays), Chrome can still pull out the data.
tell application "Google Chrome" activate tell (make new window) to tell tab 1 -- repeat loop essentially like python for in loop repeat with href in image_hrefs execute JavaScript "window.location ='https://baseURL" & href & "'" my waitForReady() execute JavaScript jqueryContents --get something execute JavaScript "document.title = $('#block').html()" delay 0.2 set value to get title end repeat end tell
A delay was stuck in just to make sure the JavaScript has time to execute before the AppleScript assumes it is done. The wait for ready subroutine is the key to Chromes suitability for this task.
on waitForReady() delay 1 tell application "Google Chrome" tell window 1 to tell tab 1 repeat execute JavaScript "document.title = document.readyState" set status to get title if status is "complete" and loading is not true then return true else delay 0.1 end if end repeat end tell end tell end waitForReady
The above function executes ten times a second until the the browser and document are ready loaded. JavaScript can be run when the document.readyState is "interactive", but sometimes the content of the page isn't ready to pull.
Conclusion
Writing this script was fun. Once I discovered the dictionaries, it was much easier to start experimenting. The uses for this language are innumerable. However, it has to be used in the right situation, writing and debugging these scripts can be a little frustrating. It could easily take more time to write than the script ends up saving.
As a programmer, I didn't appreciate the natural language syntax for AppleScript. I found it a little verbose and somewhat confusing. That is, I found it difficult to look at AppleScript samples and figure out how I could manipulate the code for another situation.
Google Book Stuffs
Go here for free info:
AppleScript: Definitive Guide
AppleScript: The Comprehensive Guide to Scripting and Automation on Mac OS X
Brilliant example ! Applescript rocks.
ReplyDelete