Courtesy of NASA/JPL-Caltech
Some of the most influential scientists in history, from Franklin, to Lavoisier, to Mendel, weren’t formally educated in their fields of study. They just took an interest, and started conducting experiments, sometimes in their own back yards. But more recently, the specialized tools of science have put new discoveries out of reach for ordinary people. Now that dynamic is beginning to change again.
The internet has spawned a fantastic explosion in science that allows non-experts to actually participate in science by interacting with scientific data itself. I wrote about several such projects last year. Now, in just the past week, two new public tools for analyzing data have been unveiled, from two very different fields of study.
First, the Zooniverse project has released its newest endeavor, called The Milky Way Project. One of the astronomers working on the project, Sarah Kendrew, explained how the project works on her blog last week. Users from among the hundreds of thousands registered with Zooniverse log in to their accounts and are shown images of the Milky Way from the Spitzer Space Telescope. Their job: To identify regions known as “bubbles,” areas of gas and other debris that appear to be arranged in nearly-spherical structures.
A 2006 paper speculated that bubbles are formed when radiation from newly formed stars sweeps gas away from their centers. But the researchers, led by Ed Churchwell, noted that what constitutes a “bubble” seems to depend on the individual who spots it in an image. What portion of a bubble-shape needs to be present before it’s truly a “bubble”? Could it be possible that the researchers were biased, selecting as true bubbles only those that supported their explanation of the phenomenon?
The best way to answer these questions is probably to ask a large number of people to identify bubbles and see if the resulting data still supports the researchers’ conclusions—or if another hypothesis explaining these structures emerges. That’s where the Milky Way Project comes in; I decided to try it out. I visited the site, created an account, watched a brief tutorial, and got to work.
The system displays a wide-field image filled with bright stars and fuzzy nebulae—some red and some green. I clicked on a circular icon to choose the bubble-marking tool. Then I clicked at the center of a green bubble-shaped nebula and dragged the cursor out to its edge. I could easily modify my selection to define the thickness of the bubble and its exact shape, if it wasn’t perfectly circular. Then I used a different tool to flag other structures—mostly “green knots” and fuzzy red areas. There were options to highlight galaxies and star clusters, but even after a dozen or more screens, I didn’t see any of those.
After identifying everything on an image, I submitted it and went on to the next one. The process was endlessly varied and fascinating. Was that the edge of a sphere I was looking at, or just an oddly curved region of interstellar gases? Could a figure-8-shaped nebula actually be two intersecting bubbles? It’s easy to while away several hours with this beautiful set of images, and since it’s all in the name of science, you can feel good about doing it, too. Zooniverse has several other projects, including Planet Hunters, Moon Zoo, and many more—all entertaining and worthwhile efforts.
But the Milky Way Project was upstaged a bit last week by another project that allows anyone to dig deep into data of a different sort: a significant percentage of the entirety of books that have ever been published. The project’s founders are calling it “culturomics,” and their first journal article on the topic was published last week in Science. The research has been discussed extensively on blogs, but the best way to experience the work might just be to visit the tool Google developed that allows anyone to see a cross-section of a vast dataset: the Books Ngram Viewer. Type in any word (or several words or short phrases separated by commas), and you’ll instantly be presented with a graph showing the how frequently that word has been used over the past two centuries. Here’s the graph I made for macaroni and spaghetti, showing that macaroni was the more popular term until about 1950, and spaghetti was essentially unheard of before 1880.
How does it work? The Google tool draws on a database generated from its vast archive of scanned books—over 15 million at last count. A team led by Jean-Baptiste Michel took a subset of about 5 million books from the database (those with the best-quality scans) and then simply charted how frequently every word appeared in the books (ignoring non-words and typos), cross-referenced by year. This allowed them to make some fascinating observations. For example, as science writer John Timmer noted, the researchers took a look at how frequently specific years were mentioned. In the 1800s, a year such as 1880 was mentioned a lot in the years immediately following it, then it gradually was used less and less. In this way, they could measure a year’s “half-life”—the amount of time for a year to be mentioned half as frequently as during its peak year. The half-life of 1880 was 32 years, while the half-life of 1973 was just 10 years, suggesting that our collective interest in the past seems to fade more rapidly now than it once did.
Similarly, says UK science writer Ed Yong, celebrities also have half-lives. In the early 1800s, a typical celebrity’s fame (as measured by their mentions in books) peaked at age 75 and took 120 years to halve; today they rise to fame faster, but their half-life is more like 71 years.
The Google tool makes this work endlessly amusing, but also quite telling. Check out this graph comparing plastic and rubber, which makes it easy to see when plastic became the more important commodity. Or check out typewriter, IBM, and Microsoft. We can also use the tool to get a rough sense of the progress of science: Here’s a graph comparing galaxy and nebula, which reflects the increasing importance of galaxies in our understanding of cosmology (and the fact that all galaxies were once thought to be nebulae). The tool doesn’t work perfectly for everything though: You’d be hard-pressed, looking at this graph of new wave, disco, punk, grunge, hip hop, and reggae, to figure out exactly when each music fad had its moment in the sun.
Here are a few of my favorite searches. Take a look at them, then try some of your own: One through ten, hundreds to trillions, new versus old, vacuum tube versus microchip, and calculator versus slide rule.
With billions of words in the database, there’s really no end to the insights that can be gained from this tool. And for more detailed analysis, the study authors have made their raw datasets available for download.
Projects like Zooniverse and Google’s Ngram Viewer represent crowd-sourcing on the grandest scale yet achieved. What I’d like to see next is an analysis of the crowds who are using these tools. What words do people look up? What methods do they use to identify galaxies and bubbles—and how long do they persist at it? The answers to these questions may, in turn, lead to even more powerful tools—and a greater understanding of our world.
Dave Munger is editor of ResearchBlogging.org, where you can find thousands of blog posts on this and myriad other topics. Each week, he writes about recent posts on peer-reviewed research from across the blogosphere. See previous Research Blogging columns »
Originally published December 22, 2010