Parsehub vs octoparse

#Parsehub vs octoparse manual#
#Parsehub vs octoparse full#
#Parsehub vs octoparse download#

This document can be saved directly or easily re-shaped to achieve fine-grained control over your end results. Run_status?api_key=$api_key&run_token=$run_token"ģ - Once the job has completed, hopefully successfully, then go and get the results in JSON (the default) or CSV format (make sure to specify 'raw' else the results will be compressed). $run_status = Invoke-RestMethod -Method Get -Uri # If the job has no end time then wait before asking again While ($run_status.end_time.length -eq 0) # Check status to see if job completed successfully

Instead you have to keep polling for the status until the job completes or fails.

#Parsehub vs octoparse download#

$return = Invoke-RestMethod -Method Post -Uri $uri -Body $fieldsĢ - Because you don't know how long an on-demand scrape will take, although you can get a ball-park figure from the time taken to download the underlying page, it's not possible to immediately grab the results. # Start scrape and retrieve token for run $start_url = "Your URL used to override project start URL" To modify these values you can inject a start URL and/or define start values that are referenced in the script. By default this will use whatever parameters the project was created with. This is what you need to do:ġ - Kick off the API with your personal API key and a token that uniquely identifies the endpoint. That said once I'd realised that there is a distinct three-step, batch-like, process to pulling on-demand data out of a pre-defined scrape then it didn't take me too long to knock up a Powershell script. The level of documentation around accessing the RESTful API is limited to a single page and there is, as yet, no hand-holding walkthrough available. However once a scrape project has been defined, and you're keen to convert it into an API, then some of the current rough edges of ParseHub become apparent. For example this is a snapshot of me capturing two directly related fields, date and price: This helps you to understand how the page in front of you can be coerced into a hierarchical format. In use a key strength for this tool is that selections and connections are explicitly visible as an overlay to the source website. I had to resort to ParseHub support (who were incredibly helpful) to get my over-complicated scrapes functioning. Start simple though I began by trying to navigate between multiple pages, which required simulated user input to display in full, and this was too much.

Once you've got your head around the need to select elements, convert them into lists with a shift-click and then extract the data in order to receive feedback (in a dynamic results pane) then it's relatively straightforward to make progress. I worked through all of them to get off the ground. It's a rich tool-set and quite challenging to comprehend - so it's lucky that there are a decent set of tutorials.

#Parsehub vs octoparse full#

The FireFox add-in provides full control over selecting elements, turning them into lists, linking them semantically and allowing for event-driven navigation. When it comes to defining the shape of your scrape the ParseHub interface is a joy to use. If either tool allows me to download a data history, for a range of companies, in an acceptable format (JSON, XML or CSV) with little more than a few clicks then I count that as a success. To test ParseHub (and Kimono a competitor picked up from Scraping API Evangelist) I pit it against a couple of sites that I favour for solid historical data: Share Prices and InvestorEase.

#Parsehub vs octoparse manual#

So assembling and cross-referencing this financial data is a time-consuming, manual task. On the face of it this is a simple task but most aggregation sources provide a "snapshot in time" view and individual company websites are frustratingly idiosyncratic. Specifically I've long held an ambition to automate the extraction of historical price (and related slowly-varying) data for companies that I might like to invest in. In the heterogeneous world of the Internet being able to coerce web pages into homogeneity is something of a holy grail! graphical) scraping engine that can digest many website structures and produce well-formatted data.

Recently Hacker Newsletter alerted me to a new visual tool that sounded interesting: ParseHub. Menu ParseHub vs Kimono: In a right scrape 19 October 2014 Introduction