Retrieving content using fsockopen

The firewall rules on some web-hosts prevent the use of fopen even if allow_url_fopen is turned on. This post documents the issue found in WP Supersized XML API.

The solution to this is to use either cURL or fsockopen.

As cURL utilises the same rules as fopen, the chances are that if fopen fails, so will cURL.

It is very rare that the rules which prevent fopen from running will have an affect on your general use of the function, however if you are trying to retrieve data from your own website via a remote call, depending on your hosts configuration, fopen will fail.

First off, why would you be using fopen to retrieve data from your own host when you can fetch it directly? The answer is that sometimes it is necessary to access content generated via a website or content management system.

I came across this issue whilst buildig the plugin wp-supersize-remote-xml. During development, fopen worked perfectly every time but as soon as I uploaded the site, fopen was refused connections. Here is the entry from my server log

To get around this, we need to convince the firewall that the request is being made from a remote server and in order to do that we need to use fsockopen and manually scrub the results which are returned as a raw http response.

The request is made by breaking the URL into 2 parts, the website domain and the parameters or HTTP GET variables, stripping off the protocol (http:// or https://) and any trailing slash characters in the process.

Next we need to attach this to a header to send using fsockopen.

This works because by setting the IP we are telling fsockopen to 127.0.0.1 it uses the servers internal loopback to connect to the website instead of being directed through the firewall where it would be subjected to the same routing information as fopen/cURL is.

Now we can read / write to the open socket as normal.

Unfortunately this puts a lot of chaff into the response. What we have back is a full raw HTTP response complete with header and packet breaks:

As you can see from the partial response above this data needs to be scrubbed before it can be used. As this is XML, it would be impossible for a system to parse it in its present condition.

Parsing it, however is a fairly straightforward process.

The first line of code above takes the response and splits it into 2 parts, the header and the actual document body (contents). As I’m not interested in the server response header, I throw this away by “popping” the content off the end of the array returned by preg_split. It is important only to split the response into two sections initially or you could end up losing your data.

The second line of code scrubs out packet breaks. As this code is dealing with XML, this is easy to do. It would be a lot harder if plaintext was being used.

The expression here looks for any line which starts with a string of characters in the case insensitive range a-z or 0-9 (of any length) OR a new line followed by one or more characters in the same range followed again by another newline.

The final completed function then becomes…

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">