About This Blog   |   Archives   |   RSS Feeds RSS Feed   (What's RSS?)

Document Security 101

Few things in the world of digital documents are as pesky and revealing as "metadata" -- the information automatically embedded in documents by popular software such as Microsoft Word or Adobe Acrobat. When the government or a business forgets to purge metadata from documents before releasing them to the public, the results can range from embarrassing to dangerous.

On Sunday, the New York Times ran a story on President Bush's Nov. 30 speech on the war in Iraq.  While White House officials said many federal departments contributed to the new national strategy on Iraq, one look at the metadata stored in the 35-page National Security Council document, titled, "Our National Strategy for Victory in Iraq," showed that the original author of the paper was Peter D. Feaver, a Duke University political scientist who was recruited to join the NSC staff as a special adviser in June after he and several Duke colleagues presented the administration with an analysis of polls about the Iraq war.  Their analysis concluded that Americans would support a war with mounting casualties if they believed that effort would ultimately succeed.

A screenshot revealing metadata embedded in the White House's recently released Iraq strategy document. (Brian Krebs)

The Times piece didn't uncover a huge scandal here.  But it's not the first time official organizations have published a document that contained a little more information than they realized. In October, the United Nations released a report on an investigation into the Valentine's Day assassination of Lebanese Prime Minister Rafik Hariri.  That document contained metadata showing substantial revisions had been made, including removing the names of persons closely tied to the Syrian government.

Even washingtonpost.com was once burned by metadata. In our coverage of the 2002 Washington-area sniper attacks, we published a letter allegedly written by the snipers and sent to police demanding that $10 million be deposited into a stolen credit card account. The editors here blacked out some of the more sensitive information in the scanned-to-PDF version of the document, including the bank account number where the loot was to be deposited. Unfortunately, as document sleuths later pointed out, the commercial version of Adobe Acrobat software can easily remove the blacked-out areas intended to hide certain details.

Finding metadata in a document is as simple as a few keystrokes. To locate metadata in an Adobe PDF document, check out this tutorial on Adobe's site. Microsoft also has published instructions that detail how to find and remove metadata from documents created in Microsoft Office. Harlan Carvey, a computer forensics expert here in Washington, also has posted some interesting findings on locating metadata online and in Microsoft Windows.

If you're interested in scanning Web sites for documents that contain metadata, check out Trace, a free tool from document security company Workshare. I played with Trace the other day while browsing the White House Web site and found a few documents with some interesting revision history. One document, dated May 2002, lays out a draft of the White House's e-government strategy, and includes a ton of metadata and revision information. Near the top of the document, for example, is a redacted warning not to apply the OMB seal until the last minute, because -- as Keith Thurston, assistant deputy associate administrator in the Office of e-Government and Technology at the US General Services Administration, wrote, "it mungs the printers."

By Brian Krebs |  December 5, 2005; 1:38 PM ET From the Bunker
Previous: Researcher: IE Flaw Allows Data Theft | Next: SunnComm Patches Flaw in Its Sony Software

Comments

Please email us to report offensive comments.



It's usually kind of funny to see people get burned by this. It also happened to a California politician, when some draft legislation regarding media piracy was shown to have originated within the RIAA (or was it the MPAA?)

Posted by: William | December 5, 2005 3:07 PM

Bill Lockyer California Attorney General put out a document condemning peer-to-peer software as "a dangerous product". It turned out that the digital fingerprints of a "stevensonv" were on it. Vans Stevenson is a senior vice president with the Motion Picture Association of America:

http://www.nytimes.com/2005/11/07/business/07link.html?pagewanted=1&ei=5090&en=98e8af679a0797f4&ex=1289019600&adxnnl=1&partner=rssuserland&emc=rss&adxnnlx=1131512582-1we4FQ0F0ErjneWBiGX7Eg

Of course, p2p software has indisputably legal uses. For example, many Linux distributions are distributed over BitTorrent. It's a bit like calling a hammer "a dangerous product" because you could hit someone over the head with it instead of using it to knock a nail in.

Posted by: Mike | December 5, 2005 3:31 PM

In May 2005 the US military released a classified document in PDF format. Many sensitive parts, such as the names of soldiers, were blacked out. Yet, a simple copy and paste operation revealed all the secret information. It would have been funny except that the document covered the unfortunate death of a very good and decent man. The document is still viewable on the web at the respectable Italian newspaper "Corriere della Sera". The article title and web address is:

"Calipari, saltano gli omissis degli americani" (1-May-2005)

http://www.corriere.it/Primo_Piano/Cronache/2005/05_Maggio/01/omissis.shtml

Posted by: Roberto | December 5, 2005 5:46 PM

Roberto,

Brian's post is about metadata...stuff hidden inside the file format of documents. Technically, redaction is a different, though no less dangerous, subject.

Posted by: Keydet89 | December 5, 2005 5:52 PM

Roberto,

Brian's post is about metadata...stuff hidden inside the file format of documents. Technically, redaction is a different, though no less dangerous, subject.

Posted by: Keydet89 | December 5, 2005 5:52 PM

The issue of metadata is in part software vulnerability and the rest education. I say this because metadata in its pure sense is data about data - hidden information that does not appear in the document. It includes a creation date, amount of time spent editing the file, author information, and if certain options are enabled in your email and word processing file, it can even include e-mail subject, sender/recipient and much much more.

The educational aspect is where computer users of all software need to be aware of what they are sending and receiving electronically. Does the document have residual tracked changes? If the file has been redacted, has it been redacted correctly (using software designed to do this) or did someone simply put shading behind text and then convert it to a PDF file and send for distribution? Other areas of concern are File Versions where different versions of a document are stored in the same file and can be exposed by choosing File, Versions, Fast Saves (Tools, Options) and other exposures that can be prevented if proper care is taken prior to distributing the document electronically.

There are several tools available for removing metadata from files created from Microsoft Office. Microsoft provides a free utility called Remove Hidden Data on their web site (www.microsoft.com). It doesn't remove as much data or have as many bells and whistles (such as integration with e-mail and document management software) but it does a pretty good job of removing compromising data not meant to be disclosed. Metadata Assistant by Payne Consulting Group (www.payneconsulting.com) was the first product to be released and is used by more than 1.5 million people worldwide (for full disclosure, this is the company that I work for). This program integrates with e-mail, document management and does an excellent job of removing metadata.

Courts are now ruling on metadata and whether it should be preserved and shared for discovery purposes. This issue will certainly be a focus and area of concern for any electronically produced document.

Donna Payne
President
Payne Consulting Group, Inc.
www.payneconsulting.com

Posted by: Donna Payne | December 6, 2005 10:15 AM

Check out a collection of these leaks at metadatarisk.org.

Posted by: Dave | December 6, 2005 1:31 PM

Check out a collection of these leaks at www.metadatarisk.org.

Posted by: Dave | December 6, 2005 1:32 PM

Simple solutions:
1. Scan to PDF - black out things you don't want seen by hand (the old way).
2. Word - Don't publish documents in Microsoft Word format. Simply converting to PDF would have only revealed the person that converted the document to PDF. Even for Microsoft Documents that have passwords, one can open a document in a Hex or other text editor and see all of the text of the document including the metadata.

Posted by: Kenya | December 31, 2005 1:23 PM

There is more to metadata than just Word and PDF documents. Metadata involves these types of documents, but also includes many other aspects of computer systems as well. This is a subject worth investigating and learning about.

Posted by: Joe | January 27, 2006 11:37 AM

Metadata is very important in the design, development, and maintenance of modern database systems and web/portal systems. It is a good idea to keep this in mind whenever you are working with these systems. There is a lot to know - and a lot to consider - even if I knew it all (that will be the day!), it would not be possible to briefly summarize all of the ramnifications, IT considerations, and business issues/rules that should be considered within the context of the above referenced types of IT systems. This is not a trivial issue - and it takes a lot of advance planning, consideration, and communication to get everyone involved on board and in agreement. Learn all you can about the subject of metadata before you make recommendations regarding large scale systems (or even small systems that will probably evolve to enterprise level systems in the future). So much to know, so much to consider . . . be prepared as they say in Boy Scouts . . .

Posted by: Joe | January 27, 2006 6:44 PM

Obviously if someone forget to make digital data physically or spiritually clean and realizing the news in public can be a make feel awkward or ashamed. For more information you can see http://www.PinionSoftware.com.

Posted by: Document Security | June 5, 2006 1:58 AM

The link included in this sentence is broken:
"To locate metadata in an Adobe PDF document, check out this tutorial on Adobe's site. "

I got "Error: Page Not Found"

Posted by: Csavargo | September 19, 2006 8:29 AM

The comments to this entry are closed.

 
 

©  The Washington Post Company