The Best in Blog

SantaCon 2008
Rat Mite Attack
Rachel and I are Engaged
Beard Cap
Marin Century Ride
Posted by Eric Lundberg on Sat, 15 Nov 2008 13:52:00 PDT
For a number of years back in the day I used the gallery.menalto.com software to host an online gallery of photos. Back in this rough and tumble wild west days of metadata there weren't great standards that could handle all possible metadata cases - and so the metadata was all stored in serialized php objects.

Fast forward to today, where there are great ways to store metadata for a photo right there with the photo. You can have exif, xmp, an iptc (to name some popular choices) and all the photo applications are aware enough to use that data. I only had limited meta data in the gallery software (title, caption, comments) but I had thousands of photos and the idea of entering that stuff in by hand was not appealing, neither was throwing it away. Thus I wrote some code to deal with it. It isn't pretty and isn't super fast but it does work.

The unserialize.php script is fairly straight forward, you feed it a photos.dat file and it spits out a line for each photo with filename, title, caption, comments and if the file was hidden or not.

unserialize.php:


#!/usr/bin/php

<?php
# Take the first argument as the photo.dat file we want to unserialize
$lines = file_get_contents($argv[1]);

# In order to access any of the unserialized class objects there needs
# to be a class definition for the objects, since I don't have the old
# gallery code we just make some empty definitions.
class albumitem {};
class image {};
class comment {};

$objects = unserialize($lines);

# To see what is availible in the object we can var dump it
#var_dump($objects);

# Go through all the photos
foreach ($objects as $obj) {

# Debugging for what is in the photo object
#  var_dump($obj);

  $img = $obj->image;
  $title = $obj->phototitle; # extraFields->Title
  $cap = $obj->caption;
  $hidden = $obj->hidden;

  # There can be multiple comments so build a little xml of all the
  # comment content that can be output as a single string.
  $comments = $obj->comments;
  $commentXML = "";
  if ( $comments != null) {
#  var_dump($comments); #debug
     $commentXML = "<comment>";
     foreach ($comments as $comment) {
        $text = $comment->commentText;
        $text = preg_replace("[\n\r]","<br/>", $text);
        $name = $comment->name;
        $datePosted = $comment->datePosted;

        $commentXML .= "<name>$name</name><date>" . date("Y-m-d H:i:s", $datePosted) . "</date><text>$text</text";
     }
     $commentXML .= "</comment>";
  }

  # Back before gallery had both title and caption I hacked in title, then when
  # gallery added 'extraFields' I migrated the titles into an extraField called title
  # so I go and check if that title exists in extra fields then pick which title I
  # should be using
  $extraFields = $obj->extraFields;
#var_dump($extraFields);
  $extraTitle = $extraFields['Title'];
#var_dump($extraTitle);
  $theTitle = $extraTitle;
  if ($title != null) {
     $theTitle = $title;
     if (strlen($extraTitle) > 0 and strlen($title) > 0 and $extraTitle != $title) {
     echo "Mismatched titles: $extraTitle $title";
     }
  }

  # build the file name from the name and image type
  $file = "";
  if ($img->name != null and strlen($img->name) > 0) {
    $file = "$img->name.$img->type";
  }
  $cap = ereg_replace("[\n\r|\r\n|\r|\n]", " ", $cap);
  $commentXML = ereg_replace("[\n|\r|\r\n|\n\r]", " ", $commentXML);

  # print out all the data separated by double bars || for easy parsing
  echo "$file||$theTitle||$cap||$commentXML||$hidden\n";
  #var_dump($img);
}

?>


The next step is to run this on all the photos.dat files I have, and take that meta data and insert it into the actual files. For this I wrote a little perl script. The perl script works on some assumptions, mostly on file and directory names.
1) It assumes: The photos.dat files are in a directory named with the date in YYYY_MM_DD format (for example 2003_07_04 4thofJuly)
2) It assumes: That the actual image files are also in a directory that contains the YYYY MM and DD (in any order).
3) It assumes: That the file names of the actual images match the names in the photos.dat serialized object.

To acomplish I had to write some scripts, first many of my albums were named DD_MM_YYYY name, which wasn't super helpful so I wrote a script get them all in the same date format, and munges the file names in the same way gallery does. (You will probaly have to fiddle with the script to make it work for your own situation.) This takes care of assumption 1 and 3.


!/usr/bin/perl -w

use strict;

my $path = "/Users/eric/Pictures/2002";
opendir(DIR, "$path") || die "can't opendir . $!";
# get dirs in DD_MM_YYYY format
my @dirs_to_rename = grep {  /^\d\d_\d\d_\d\d\d\d.*/i && -d "$path/$_" } readdir(DIR);
closedir DIR;

foreach my $dir (@dirs_to_rename) {
   if ($dir =~ /(\d\d)_(\d\d)_(\d\d\d\d)(.*)/) {


     opendir(FDIR, "$path/$dir") || die "can't opendir . $!";
     my @files_to_rename = grep {  /.*-.*/i && -f "$path/$dir/$_" } readdir(FDIR);
     closedir FDIR;
     foreach my $file (@files_to_rename) {
        # make sure the named is munged as gallery expects
        my $newFile = $file;
        $newFile =~ s/-/_/g;
        $newFile =~ s/JPG$/jpg/g;
        $newFile =~ s/_\.jpg/\.jpg/g;
        my $cmd = "mv $path/$dir/$file $path/$dir/$newFile";
        print "\t$cmd\n";
        `$cmd`;
     }

     my $cmd = "mv $path/$dir $path/$3_$1_$2$4";
     print $cmd,"\n";
     `$cmd`;
   }
}



Assumption 2 seemed like a given for me since I have always included year month date in my folder names, and ignoring order it seems like I would be fine, however I found a few cases where I had put the wrong date in the gallery album and correct date in the folder on my computer. So I had to write a little script to check that all the folders with photos.dat files actually had matching folders of images with the same date. Thus I wrote another script, this one assumes that you have run updatedb recently so the locate command will work. Basically I parse out the date of a photos.dat file path, then grab the first image and locate the image which gives me all the paths, I then parse those paths looking for one that matches the data and the folder location of my images sans metadata.


#!/usr/bin/perl -w

use strict;

my @photoDats = split(/\n/,`find . -name "*photos.dat"`);

foreach my $photoDat (@photoDats) {
  my @unserializedDataLines = split(/\n/, `./unserialize.php $photoDat`);

  my $year = "";
  my $month = "";
  my $day = "";
  if ($photoDat =~ /(\d\d\d\d)_(\d\d)_(\d\d)/) {
    $year = $1;
    $month = $2;
    $day = $3;
  } else {
    print "No date associated with photos.dat ($photoDat)\n";
  }

  my $found = 0;

  foreach my $photoDataLine (@unserializedDataLines) {
    if (length($photoDataLine) == 0) {
        next;
    }

    my ($file, $title, $caption, $comment, $hidden) = split(/\|\|/, $photoDataLine);
    if (length($file) == 0) { next; }

    my $cmd = "locate $file";
    my $result = `$cmd`;
    my @files = split(/\n/, $result);
    if ($#files + 1 == 0) {
       print "Could not file $file\n";
    }


    # my photos .dat files are in ../eric/albums/ and the folder of images with no
    # meta data are in ../eric/Pictures/YYYY/etc  Aperture managed to pick up some
    # of the album folders from long in the distant past though an iphoto import and
    # iphoto had sucked them down while scanning the drive, oye!  So do these
    # various checks on the path to make sure the path really is a path with
    # the correct date in correct location
    foreach my $fileLoc (@files) {
      if ($fileLoc !~ /Users\/eric\/Pictures/) {next;}
      if ($fileLoc =~ /Aperture/) {next;}
      if ($fileLoc !~ /$year/) {next; }
      if ($fileLoc !~ /$month/) {next; }
      if ($fileLoc !~ /$day/) { next; }
      $found = 1;
      last;
    }

    if ($found == 1) {
      last;
    }
  }

  if ($found == 0) {
   print "No matches: $photoDat\n";
  }
}


This finds both photo.dat files that have a directory name that is a little off from the image file directory and it also finds instances where I had only loaded the images into gallery and hadn't managed to keep a copy in my Pictures folder. For the later case I just make the approriate directory and copied the image files from gallery.

Finally I have the script that reads all the photos.dat files, unserializes them, finds all instances of those files, and updates the metadata.

process.pl:


#!/usr/bin/perl -w

use strict;

my @photoDats = split(/\n/,`find . -name "*photos.dat"`);

foreach my $photoDat (@photoDats) {
  my @unserializedDataLines = split(/\n/, `./unserialize.php $photoDat`);

  my $year = "";
  my $month = "";
  my $day = "";
  if ($photoDat =~ /(\d\d\d\d)_(\d\d)_(\d\d)/) {
    $year = $1;
    $month = $2;
    $day = $3;
  } else {
    print "No date associated with photos.dat ($photoDat)\n";
  }

  foreach my $photoDataLine (@unserializedDataLines) {
    if (length($photoDataLine) == 0) {
        next;
    }

    my ($file, $title, $caption, $comment, $hidden) = split(/\|\|/, $photoDataLine);
    if (length($file) == 0) { next; }

    print "$file  ====>  $title ====>  $caption ====>  $comment\n";
    my $cmd = "locate $file";
    my $result = `$cmd`;
    my @files = split(/\n/, $result);
    print "FILES: " , join("-", @files), "\n";
    if ($#files + 1 == 0) {
       print "Could not file $file\n";
    }

    # If the photo is hidden set rating to zero, otherwise set it to 1 as a place to
    # start with rating them in lightroom
    my $rating = 1;
    if (length($hidden) > 0  and $hidden == 1) {
      $rating = 0;
    }

    foreach my $fileLoc (@files) {
    if ($fileLoc !~ /$year/) {print "Skipping $fileLoc does not contain $year\n"; next; }
    if ($fileLoc !~ /$month/) {print "Skipping $fileLoc does not contain $month\n"; next; }
    if ($fileLoc !~ /$day/) {print "Skipping $fileLoc does not contain $day\n"; next; }

       if (length($title) > 0) {
         $cmd = "exiv2 -M\"set Xmp.dc.title       lang=x-default $title\" \"$fileLoc\"";
         print $cmd, "\n", `$cmd`;
       }
       if (length($rating) > 0) {
         $cmd = "exiv2 -M\"set Xmp.xmp.Rating   $rating\" \"$fileLoc\"";
         print $cmd, "\n", `$cmd`;
       }
       if (length($caption) > 0 or length($comment) > 0 ) {
         $cmd = "exiv2 -M\"set Xmp.dc.description       lang=x-default $caption $comment\" \"$fileLoc\"";
         print $cmd, "\n", `$cmd`;
       }
    }
#  exiv2 -M"set Iptc.Application2.Credit String Mr. Smith" image.jpg
# set Xmp.dc.title       lang=x-default Sunset on the beach
# set Xmp.xmp.Rating  1
# set Xmp.dc.description lan=x-default Descrition
#  XmpText | XmpAlt | XmpBag | XmpSeq | LangAlt
# Xmp.dc.title                                 LangAlt     1  lang="x-default" Title
# Xmp.dc.description                           LangAlt     1  lang="x-default" Caption
  }
}


So there ya go. Hopefull that is useful to someone else at somepoint down the line. I suppose one final point is if you had made descriptions for you albums you would need to write something like unserialize.php but that takes the album.dat file instead and strips out the description. The challenge there is where do you put that description? I'm thinking of just putting it in text file in the fold it is associated with, but that is a task for another time.
Submit Comment
Digg!