Our acquisition of Metaweb back in July also brought along Freebase Gridworks, an open source software project for cleaning and enhancing entire data sets. Today we’re announcing that the project has been renamed to Google Refine and version 2.0 is now available.
Google Refine is a power tool for working with messy data sets, including cleaning up inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases. Version 2.0 introduces a new extensions architecture, a reconciliation framework for linking records to other databases (like Freebase), and a ton of new transformation commands and expressions.
Freebase Gridworks 1.0 has already been well received by the data journalism and open government data communities (you can read how the Chicago Tribune, ProPublica and data.gov.uk have used it) and we are very excited by what they and others will be able to do with this new release. To learn more about what you can do with Google Refine 2.0, watch the following screencasts:
Google Refine is a power tool for working with messy data sets, including cleaning up inconsistencies, transforming them from one format into another, and extending them with new data from external web services or other databases. Version 2.0 introduces a new extensions architecture, a reconciliation framework for linking records to other databases (like Freebase), and a ton of new transformation commands and expressions.
Freebase Gridworks 1.0 has already been well received by the data journalism and open government data communities (you can read how the Chicago Tribune, ProPublica and data.gov.uk have used it) and we are very excited by what they and others will be able to do with this new release. To learn more about what you can do with Google Refine 2.0, watch the following screencasts:
http://www.youtube.com/watch?v=yNccGtn3Wb0 (7 min)
http://www.youtube.com/watch?v=45EnWK-fE9k (9 min)
http://www.youtube.com/watch?v=m5ER2qRH1OQ (6 min)
The project is open source and its code and downloads are available here. Changes from version 1.1 to 2.0 are listed here.
By David Huynh, Google Search Infrastructure