Metadata, Photography and Workflow for the Web

Metadata (n, pl): data about data. Any questions?

Recently, there has been a lot of discussion in photography circles about metadata: what is it, how to manage it, what is it good for, etc. Some of the photographers I follow in the blogosphere and more recently on Twitter have interesting things to say on the matter (look to the right for links to some of these guys). I decided to offer some comments about how I use metadata, in the hope these might be useful to other photographers. Who the hell am I and why do my comments matter, you wonder? Good question. I do not have much of a profile among photographers, which is somewhat intentional, but I do have a website that does well with the one search engine that really matters. By way of introduction, here is a short bio about me and about how my website developed over the last 11 years. During that time I have learned how to leverage photographic metadata on a photography website (at least search engines seem to like my site) and am willing to share some of what I have learned. As an aside, other than maintain a website I do no marketing whatsoever, nor do I send out submissions anymore. All of my licensing activity comes either because a client contacted me via my website, or through a couple of old-fashioned photographer-representative-type agencies I am with. Revenues stemming from my website outnumber the agency revenues about 8:1. I attribute this to the effective use of metadata on my website.

If your goal is to develop a stock photography website that shows up in search engine results, metadata about your photographs is crucial. Text, in particular metadata accompanying photos, is all that search engines are able to grab and hold on to as they try to index and spider a website. If your site displays beautiful images with little metadata to accompany them, your site stands a good chance of not appearing in meaningful search engine results. Except for specialized search engines that index image data directly (e.g., Tineye), search engines use the textual information on your site when evaluating it. This goes for images too — search engines will consider the text associated with an image when trying to categorize an image. If you have organized that text information well, and made sure it includes meaningful metadata about the image(s) that are displayed on that web page, that image or page at least has the potential to show up well in search results.

In my workflow there are three types of metadata that I am concerned with:

  • EXIF: shooting parameters, recorded by the camera
  • GEO: geographic data, if I am geocoding the images
  • IPTC: user-supplied information, describing characteristics and business matters related to the image or me.

Following is a description of my photography workflow, from the time the images are downloaded to a computer until my website is updated to include the most recent images. The percentages are the relative time it takes for each step, not including the selections, editing and Photoshop work which take place at the very beginning and which are independent of the metadata side of things.

Step 1: EXIF, The Default Image Metadata (5%)

First I edit the shoot down to keepers. Typically, each keeper is a pair of files: one raw and one “master”. The raw file automatically contains EXIF data about the shooting parameters, copyright information, etc. The master file, usually a 16-bit TIFF or high quality JPEG that is a descendent of the raw file having been processed in a raw converter and or Photoshop, contains the EXIF data as well. At this point nothing special has been done about metadata. The EXIF metadata that is already in the images was placed there by my camera, requiring no work on my part and is what I consider “default metadata”.

I back up my RAW keepers at this point. They have not been touched by any digital management or geocoding software; they are right out of the camera. These go on a harddisk and on DVD disks, and are set aside for safe keeping in case the RAW file is somehow corrupted later in my workflow. It has not happened to me yet, knock on wood, but one never knows…

Step 2: Geographic Metadata, Geocoding (optional) (5%)

If I have geographic location data, it is added now. I often geocode my images, which is the process of associating GPS information, e.g., latitude, longitude and altitude, with the image. I use a small handheld GPS to record the locations as I shoot, and these locations are added to the images by a geocoding program. Conceptually, geocoding gives the image some additional value, since it is now associated with a particular place at a particular time. Sometimes the accuracy of this geocoding is as tight as 20′ (6m). It usually just takes a few minutes to launch the geocoding application, point it to the images and the GPS data, and have it do its thing.

Having GEO data in the image, and later in the database that drives my website, allows me to do some interesting things with my images and blog posts, such as presenting them with Google Earth at the location where they were shot. For example, this photo of the Wave in the North Coyote Buttes is geocoded, and can be viewed in Google Earth by clicking the little blue globe icon. The same goes for most of the blog posts I have: they can be viewed in Google Earth at the right place on the planet. Here is another example. If you have Google Earth installed on your computer, you should be able to click on both of the next two links, which will open into Google Earth. One will display a track and the other will overlay photos, both from a recent aerial shoot around San Diego:

http://www.oceanlight.com/kml.php?file=20090116.kml
http://www.oceanlight.com/22285-22305.kml

Yes, somewhat crude, but we are in the early days of geocoding and there will be more interesting things in the future we can do.

I’ve written a fairly lengthy post describing how I geocode images: How To Geocode Your Photos. At present, I use a free application named “GPicSync” to add GEO data into each image. This application will update the EXIF information in my RAW and master images to include latitude, longitude and altitude.

A bit of opinion: my belief is that having GEO data associated with your image, on your website, is almost certainly a good thing. Even if no person ever looks at it, there are new technologies coming online constantly that look for, index, spider, collate and retrieve images and web pages based on their GEO data. Those images and web pages that are lacking in GEO data will not see any of the advantages that these new technologies offer. I admit I am no expert on this, and the entire geocoding world along with the entities out there that are indexing geocoded webpages, is all rather new to me. However, I am certain that there will be visitors to my site, and probably already have been many, that arrive as a result of the GEO data that is present alongside my images and blog posts. Having the GEO data embedded in the metadata of the photograph is the first step in this process.

Step 3: Import Images into Digital Asset Management Software (5%)

I import the keeper images, both RAW and master, into Expression Media, which is the software I use for “digital asset management” (whee, yet another acronym buzzword: DAM). I’m no fan of Microsoft, but I do like Expression Media and am used to it (I formerly used its predessor, IView). In particular, Expression Media allows programs (scripts) to be written in Visual Basic. The scripting feature alone is worth its weight in gold as I will point out in the last step of my workflow, and is what makes my processing of images so automated now. I’ve written a dozen or so scripts. It’s quite easy. I have had no training, and have never read any manual for the software. I just based my scripts on examples I’ve found on the internet from other Expression Media users, modifying them to meet my own workflow needs. They carry out mundane tasks and really speed the process up, for example:

  • Set baseline IPTC metadata, including copyright notice, name, address, email, website.
  • Set baseline “quality”, based on the camera model information in the EXIF. In this way I can rank certain images higher on the website if they shot on a better camera, other factors being equal. I normally don’t want images shot with a point and shoot to appear before those shot with a 1DsIII. I’ve come up with a baseline ranking scheme to differentiate the following image sources relative to one another in terms of typically quality (not in this order however): Canon 1DsIII, 1DsII, 1DIIn, 5D, 50D, 30D, Nikon D100, Panasonic Lumix LX3, LX2, Nikon Coolscan LS5000, LS4000, various drumscans. I can easily fine tune this later for individual images, increasing or decreasing the “quality” of each image so that certain images appear first when a user views a selection of photos.
  • Determine the aspect ratio (3:2, 4:3, 16:9, custom) and orientation (horizontal, vertical, square, panorama) of the master image, which may be different than that of the raw image(s) from which it is sourced. This is important for cropped images and for panoramas and/or HDR images assembled from multiple raw files. The script recognizes the multiple raw files that are used to generate a single master file.

At this point my images have EXIF metadata, perhaps containing GEO data if a geocoding step was performed, and basic IPTC metadata that identify the image as mine, how to reach me, etc. So far all I have done is run some applications and scripts. I really haven’t done any “manual” keywording or captioning yet. If necessary, the images are now ready to place on the web, since they have a minimal set of metadata in them that at least establishes them as mine (DMCA anyone?). However, the most important step is to come.

Step 4: Keywording and Captioning (80%)

It’s time to add captions, titles, keywords, categories, etc. to the image. With my new images already imported in Expression Media, and already containing full EXIF metadata and baseline IPTC metadata, I am ready to begin.

  • Captions. There is no shortcut for this. Each image needs a decent caption. It is common to group images and assign the same caption to all of them, and then fine tune captions on individual images as needed. The notion of a “template” can be used too, and lots of different DAM applications support this. Whatever application you use to caption your images, there is no alternative but to get your hands dirty and learn how to do it, what approach works best for you. A key concept is to caption well the first time, so you don’t feel a need to return in the future and add more.
  • Keywords (open vocabulary descriptors). In general, the same notion as captioning applies here. However, DAM applications often have special support for keywords, allowing you to draw keywords from a huge database of alternatives, facilitating the use of synonyms, concepts, etc. Expression Media allows the use of custom “vocabularies”. A vocabulary is basically a dictionary. For animal images, I developed a custom vocabulary/dictionary of 26,000 species, including most bird and mammalian species, with complete hierarchical taxonomic detail. So, when keywording, I simply type in the latin (scientific) name for a group of images (all of the same species) and up pops a taxonomic record in the vocabulary, showing kingdom, phylum, family, genus, species, etc and a bunch of important scientific-gobbledygook for the species. Hit return and bingo, all the images I have highlighted are all keyworded with appropriate taxonomic metadata. Similar ideas work for locations. I do not do much keywording for “concepts” (e.g., love, strength, relationships, childhood) since I do not pursue that sort of thematic stock, there is enough of that in the RF and micro stock industries already. Here is a list of keywords I currently have among my images.
  • Categories (closed vocabulary descriptors). This is the third area of captioning that I find important. Images in my stock files are typically assigned one or more “categories”, and these categories are stored in the metadata of the image alongside captions and keywords. Some examples are: Location > Protected Threatened And Significant Places > National Parks > Olympic National Park (Washington) > Sol Duc Falls and Subject > Technique > Aerial Photo > Blue Whale Aerial. Here is a stocklist of categories I currently have among my images.
  • Custom Fields for the website. I have a few other metadata fields that are seen by website visitors that I set via Expression Media scripts. For example, once the captions are created, a script can be used to create “titles” for a group of images, which are really just excerpts of the full captions and can be used for HTML titles, headers, etc. For the most part, these additional metadata fields are secondary in importance to the captions, keywords and categories.
  • Custom Fields for Business Purposes. In addition, I use some metadata fields for recording characteristics of the image that I need to track for business reasons. These include licensing restrictions, past uses that affect exclusivity, etc. These metadata are embedded in the image so they are sure to travel with the image as it moves to a client, but they are not presented to the public on the web site.

Note that I consider keywords to be “open vocabulary”, in the sense that any keyword can be used with an image. In other words, I don’t hesitate to add keywords that I have not yet used, its an open set and grows as needed. This is especially true of synonyms, but one doesn’t want to get too carried away with synonyms or it can dilute the search results that a web visitor sees. I often add keywords to images that are already in my stock files at a later date. However, I treat categories as “closed vocabulary”, in that I have a relatively fixed set of hierarchical categories. I will introduce a new category when it makes sense, but usually only when there is a sufficiently large group of images to which it applies, and there is not already a similar category in use.

Once all the metadata for the keepers in my latest shoot are defined in Expression Media, they need to be written out to the images themselves. In other words, Expression Media is aware of these things, but if one were to open one of the images (RAW or master) in Photoshop the new metadata would not be there. This last step in Expression Media is referred to as “syncing” the annotations. (“Annotations” is Expression Media’s word for metadata. I guess “metadata” is scary to people.) I highlight all the files for which I have been adding metadata, then Action -> Sync Annotations -> Export Annotations To Original Files and click “OK”. All the metadata is now stored in the images themselves, and will flow into any derivative images that are created, such as the thumbnails and watermarked JPGs that go onto my web site. (Think DMCA!).

Step 5. Downsteam, or, “Go Forth My Minions” (5-10%)

If I have defined the metadata once there is no need to do it ever again. The metadata, which is now contained in the DAM application but also in the header of each image, “flows downstream” with no further effort. For my purposes, “downsteam” can mean a submission of selects sent to a client, or a submission of images to an agency, or an update of my website.

Downsteam to Clients

There is not much to say here. Best practices in delivering images to clients include using metadata properly. If you are sending out images to clients, or to stock agencies (the old-fashioned kind that actually represent their photographers) or to, for shame for shame, stock portals (RF, micro, they are all evil), then you should have rich, accurate metadata embedded in your image. It is the only way to ensure that the information travels with the image. I’ve received submission requests from potential clients who simply wanted JPGs submitted as email attachments, with the proviso that if a JPG did not have caption and credit embedded in the metadata it would be immediately discarded without consideration.

Downstream to the Web

For many photographers, the final step in processing a new shoot is to update one’s website. In other words, get the new images along with all their metadata (captions, keywords, GEO locations, categories, etc.) onto the web so that they can be seen by the entire world.

For photographers who are using a “gallery” of some kind to host their web site (such as Smugmug, Flickr, PBase, or any of the freely available installable gallery softwares, etc.), simply uploading the images into a new (or existing) gallery is usually all that is necessary. Provided you have managed your metadata in step 4 properly, the metadata will be present in the headers of your new images. As these images are uploaded to the gallery, the gallery software peeks into the header of each image for metadata and, if it is found, extracts the metadata and prepares it for display alongside the image. The details of what metadata are used (caption, keywords, location, GEO, name, copyright, restrictions, EXIF, etc.) differ somewhat from one gallery provider to another, but the general idea is the same.

However, see the final notes at the end of this post for a few caveats about how gallery software may alter your metadata as it processes your image.

My situation is conceptually the same. My website software is essentially a “gallery” including a pretty extensive search feature. However, the software was hand written by me and does not extract metadata from image files automatically like the big-boy galleries do. (Perhaps someday I’ll figure out how to do that.) As I described a few days ago, my web site evolved to be written entirely in PHP and MySql. Underneath the website there is a database that contains information about all 25000 images in my collection. Basically, this database **is** the metadata for my images, or a summarization of those metadata. The database has one record per image. Each record stores the metadata for that image: caption, keywords, image name, location, GEO data, categories, orientation, etc. etc. That said, the issue for me is: how to create this database? The gallery software in the previous paragraph does this automatically, but my home-brewed web software does not.

The beauty of using Expression Media for DAM in my workflow is that with a single click, Expression Media can create this database for me. (Although I have not used other DAM applications, I am sure they are similar.) Expression Media has a few ways of doing this. I could use Expression Media’s built in export functions (Make -> Text Data File or Make -> XML Data File). But after doing this for a while I decided to write a BASIC script within Expression Media that creates the database while doing some fine tuning and error checking on the metadata fields as it does so. Either way, if I use a script of my own or Expression Media’s built-in export features, the database is easily created. Then it is simply a matter of uploading the database along with the images when it is time for a website update.

The point here is that once the work is done in the DAM application, it should be a very quick process to upload the images and metadata to the web and get the images out there for the world to see. Then, if all goes well, the phone rings.

Afterward

After all that work defining the metadata for your images, and ensuring that it is embedded properly in each image, you would think you are home free, right? Well, there are a few provisos you should know.

Metadata Can Be Stripped By Gallery Software

Some stock portals, gallery hosting services, or install-yourself gallery software (usually written in PHP) will strip metadata from an image. That’s right, they will strip it right out of your image! Why? They claim the reason is to shrink the JPGs that are displayed on the web, in an effort to reduce bandwidth. While this is true, it is a big mistake in my opinion, and is one of the principal reasons I am not involved in any of the stock portal sites or popular photo hosting services. I want my metadata to stay with the image wherever it goes, to all derivative versions of the image. The few extra bytes of storage required for this are trivial compared to the importance of this data being preserved. Think DMCA! Think Orphan Works!

Metadata Can Be Stripped By A Thief

When a thief, or some unwitting schoolkid, makes a copy of your image off the web, the chances are quite good the metadata will be stripped. If the image is taken via a screen shot, the metadata will disappear. If the thief/kid uses “right-click and Save As”, the metadata should remain in the image. But in the end, if the thief/kid alters the image in Photoshop and uses “Save For Web” to save a new copy, the metadata will probably be stripped out. (Yes, Save For Web can optionally preserve metadata, but it is easy to configure Photoshop so that it strips metadata from the image in “Save For Web”, and older versions of Photoshop do not offer the option to override this.)

Too Much Metadata Can Be Displayed

The photo hosting sites seem to display the EXIF fields (shooting data) of your photo’s metadata. This may or may not be what you want. Among hobbyists there is little concern about making the date, time of day, and technique (ISO, shutter speed, aperature) known. Indeed, it is one of the ways that we learn, by understanding what others have done. But often pros have good reason to keep this information to themselves. So, the caveat here is: if you are using a photo hosting service and you don’t want the EXIF data in your image available on the web, you may need to take steps to prevent it.