Google Sheets importxml() can't fetch from eoddata.com

Question

I'm trying to fetch a piece of data using importxml() but it just fails with "Loading..." and never resolves. Other pages have no such problem at all, just this one and I don't know why.

Here is the formula that doesn't work:

=ImportXML("http://eoddata.com/stockquote/OPRA/WFM160826P00032000.htm", "//*[@id='ctl00_cph1_qp1_div1']/div[2]/table/tbody/tr[2]/td[5]/font/b")

Ultimately I'm trying to fetch the "bid", "ask", and "open interest" elements, but all fail. By way of comparison, this works fine:

=ImportXML("http://finance.yahoo.com/quote/WFM160826P00032000", "//*[@id='quote-summary']/div[1]/table/tbody/tr[4]/td[2]")

any ideas what's wrong?

Aurielle Perlmann · Answer 1 · 2016-08-18T14:38:00.283

EDIT:

So If your interested in trying google apps script, here is short simple script that grabs the piece you want:

function getOpenInterest(url) {
  var found, html, content = '';
  var response = UrlFetchApp.fetch(url);
  if (response) {
    html = response.getContentText();
    if (html) content = html.match(/OPEN INT:<BR><FONT size=1 color='#333333'><B>(\d|\d+,?\d+?)<\/B>/gi)[0].match(/<B>(\d|\d+,?\d+?)<\/B>/i)[1];
  }
  return content;
}

Basically the pieces following the html.match segment, which are surrounded by a pair of forward slashes / represent the regex. The piece in the parentheses handles both a single digit, and a number of digits, also a digit with a comma should it occur.

Another way to do it, if your not into google apps script, is to use importdata and then regex out the piece you want:

=REGEXREPLACE(FILTER(INDEX(IMPORTDATA("http://eoddata.com/stockquote/OPRA/WFM160826P00032000.htm"),,1),REGEXMATCH(INDEX(IMPORTDATA("http://eoddata.com/stockquote/OPRA/WFM160826P00032000.htm"),,1),"OPEN INT:")),"(^.*OPEN INT:<BR><FONT size=1 color='#333333'><B>)(\d+)(<.*)","$2")

The reason I do it this way is because if you concatenate all of it, it will exceed the character limit for one cell... however this method is unreliable and not guaranteed the piece you want will always be found in that first column

score 1 · Accepted Answer · edited May 23 '17 at 12:35

Short answer

The problem is the xPath, specifically the /tbody part as it's not present in the source code.

Explanation

The source html code of the page doesn't include the tbody tag. It's added by the browser when it is missed to create a DOM hierarchy but the algorithm behind IMPORTXML doesn't do the same, it requires the xPath for the original source code of the web page.

Removing it from the xPath will solve the problem.

The following formula returns the value of OPEN INT, 141:

=ImportXML("http://eoddata.com/stockquote/OPRA/WFM160826P00032000.htm",
     "//*[@id='ctl00_cph1_qp1_div1']/div[2]/table[1]/tr[2]/td[5]/font/b"
 )

References

https://stackoverflow.com/questions/938083/why-do-browsers-insert-tbody-element-into-table-elements

Google Sheets importxml() can't fetch from eoddata.com

2 Answers2

Short answer

Explanation

References

Linked