HTTP request and HTML parsing with C++

Peters was eating dog food. Why are you eating dog food, I asked Peters. I don’t have time to explain, was his response.

He has one eye on the natural world and one eye in his Oculus Rift virtual reality headset.

‘Don’t worry,’ I told him, placing a hand on his fatty shoulder, ‘I will bring the latest news to your synthetic universe.’

Here’s how…

HTTP Request

I am using Microsoft Visual Studio Community 2015 on Windows 10.

I am going to use libcurl for HTTP requests in my C++ project. I went to the curl download page and got myself

No need to use CMake, as there is already a compatible Visual Studio project for me at curl-7.50.3\projects\Windows\VC14\lib. Build the project and spawn the lib and dll files. I have everything I need.

Thanks to Daisy on Stack Overflow for clarifying the steps, including how to hook libcurl include, lib and dll into my project.

Now, to make libcurl nice and easy to use in my project, I put a wrapper around it. Mark Lakata on Stack Overflow shows us how it’s done with his CURLplusplus class.

HTML Parsing

With our libcurl wrapper, we can make HTTP requests and fetch web pages as a lovely long HTML string. But what we really want to do is parse the HTML, so we can easily pick out the bits of the web page that we want.

I am using htmlcxx, a simple HTML parser for C++ (it’s so damn simple to use, check out their example).

Once downloaded, I built the htmlcxx Visual Studio project and yielded the lib file.

HTTP and HTML in Virtual Reality

Okay, so we can use libcurl to make an HTTP request. And we can use htmlcxx to parse the HTML we get back. But how to turn tricks for such things?

For me, I want to let my blubbery Dutch lodger know when something happens online, whilst he is wearing his Oculus Rift virtual reality headset.

So I amend the virtual world from my previous post, where I made a cone rise and fall as motion was detected in a webcam. Instead, I will make the cone appear and disappear, depending on whether a certain piece of data can be found on a given web page.

As per the previous post, I will run my HTTP request and HTML parsing code in a thread, so not to block my VR app from rendering.

We get an HTML string back from the libcurl wrapper class for a given url and then let htmlcxx get busy parsing:

tree<htmlcxx::HTML::Node>::iterator it = dom.begin();
tree<htmlcxx::HTML::Node>::iterator end = dom.end();
for (; it != end; ++it)
    if (it->tagName() == "p")
        Data += get_child_content(dom, it);

We get the content of all paragraph tags. Stack Overflow nobar helps with the get_child_content function.

Armed with all the paragraph content on the given web page, we can make the cone appear if the text “Lindsay Lohan” is found:

if (webpageManager->Data.find("Lindsay Lohan") != string::npos)
    Meshes[i]->Render(view, proj);

Peters has a fetish for Lindsay Lohan. He is one sick individual (not because it’s Lindsay Lohan you understand, just because…)

So now the green cone appears in Peters virtual world when Lindsay hits the news – my sweaty chum can dislodge his virtual reality headset and go browsing for her:


Peters said, ‘Can’t you make the web page of Lindsay actually appear in the virtual world, so I don’t have to pull off my helmet.’

Another day, morbid one. Another day.