Parsing Media RSS with PHP SimpleXML

Parsing XML docs with PHP SimpleXML is pretty straightforward. Yesterday i lost around 5 minutes to parse a Media RSS XML, and that was weird because normally with SimpleXML you take like 30 seconds… A Media RSS (MRSS) document is just a RSS with media extensions:

<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">
  <channel>
    <title>RSS Title</title>
    <link>http://www.domain.com/mylink</link>
    <description>My description</description>
    <item>
      <title>Title item 1</title>
      <link>http://www.domain.com/item_1.html</link>
      <description>Item 1 description</description>
      <guid>http://www.domain.com/item_1.html</guid>
      <media:content url="http://www.domain.com/item_1.jpg" height="240" width="320" />
    </item>
    <item>
      <title>Title item 2</title>
      <link>http://www.domain.com/item_2.html</link>
      <description>Item 2 description</description>
      <guid>http://www.domain.com/item_2.html</guid>
      <media:content url="http://www.domain.com/item_2.jpg" height="240" width="320" />
    </item>
    .... etc 
  </channel>
</rss>

The “problem” is to access the media:content or the other media:* elements. But don’t worry I’m going to show you how to do it 🙂

$xml = simplexml_load_file('http://domain.com/mrss.xml');
$namespaces = $xml->getNamespaces(true); // get namespaces

// iterate items and store in an array of objects
$items = array();
foreach ($xml->channel->item as $item) {

  $tmp = new stdClass(); 
  $tmp->title = trim((string) $item->title);
  $tmp->link  = trim((string) $item->link);
  // etc... 
  // now for the url in media:content
  //
  $tmp->media_url = trim((string) 
                    $item->children($namespaces['media'])->content->attributes()->url);

  // add parsed data to the array
  $items[] = $tmp;
}

There, a piece of cake!

UPDATE

I received a comment about Picasa RSS feed, where you have to dig just a bit deeper, as the media:url is inside a media:group. The XML feed is as follows

<?xml version='1.0' encoding='UTF-8'?>
<rss xmlns:atom='http://www.w3.org/2005/Atom' 
xmlns:media='http://search.yahoo.com/mrss/' 
xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' version='2.0'>
  <channel>
    <atom:id>https://picasaweb.google.com/data/feed/base/user/103218581909188195000</atom:id>
    <lastBuildDate>Wed, 16 Apr 2014 07:28:42 +0000</lastBuildDate>
    <title>Galerie fotografií uživatele Jiřetín JINAK</title>
    .... etc
    <item>
      <pubDate>Thu, 10 Apr 2014 07:16:22 +0000</pubDate>
      <atom:updated>2014-04-16T07:28:42.202Z</atom:updated>
      <author>Jiřetín JINAK</author>
      .... etc
      <media:group>
        <media:content url='https://lh6.googleusercontent.com/-C6WmXjRnV8Y/U0ZFRnm-ujE/AAAAAAAAAPQ/AbwIc0Ycugk/s100-c/RizikovaMistaVHornimJiretine.jpg' type='image/jpeg' medium='image'/>
        <media:credit>Jiřetín JINAK</media:credit>
        <media:description type='plain'/>
        <media:keywords/>
        <media:thumbnail url='https://lh6.googleusercontent.com/-C6WmXjRnV8Y/U0ZFRnm-ujE/AAAAAAAAAPQ/AbwIc0Ycugk/s160-c/RizikovaMistaVHornimJiretine.jpg' height='160' width='160'/>
        <media:title type='plain'>Riziková místa v Horním Jiřetíně</media:title>
      </media:group>
    </item>
    .... etc
  </channel>
</rss>

The PHP code follows the same logic, just add another step to take into account media:group

$xml = simplexml_load_file('http://picasaweb.google.com/data/feed/...&prettyprint=true');
$namespaces = $xml->getNamespaces(true); // get namespaces

$items = array();
foreach ($xml->channel->item as $item) {

  $tmp = new stdClass();
  $tmp->title = trim((string) $item->title);
  $tmp->link  = trim((string) $item->link);
  // etc...

  // now for the data in the media:group
  //
  $media_group = $item->children($namespaces['media'])->group;

  $tmp->media_url =    trim((string)
                       $media_group->children($namespaces['media'])->content->attributes()->url);
  $tmp->media_credit = trim((string)
                       $media_group->children($namespaces['media'])->credit);
  // etc

  // add parsed data to the array
  $items[] = $tmp;
}