Peter Fox, Rensselaer Polytechnic Institute – Data Set Analysis

On Rensselaer Polytechnic Institute Week: Are the secrets to solving our problems hiding in plain sight?

Peter Fox, professor and tetherless world constellation chair, details how it’s all in the data.

Peter Fox is a Tetherless World Constellation Chair and Professor of Earth and Environmental Science, Computer Science and Cognitive Science at Rensselaer Polytechnic Institute. Fox also directs the Insitution-wide interdisciplinary Information Technology and Web Science program. Previously, he was Chief Computational Scientist at the High Altitude Observatory of the National Center for Atmospheric Research and before that a research scientist at Yale University. Fox has a B.Sc. (hons) and Ph.D. in Applied Mathematics (including physics and computer science) from Monash Univsersity. His research and education agenda covers the fields of data science and analytics, ocean and environmental informatics, computational logic, semantic Web, cognitive bias, semantic data frameworks, and solar and solar-terrestrial physics. The results are applied to large-scale distributed scientific repositories addressing the full life-cycle of data and information within specific science and engineering disciplines as well as among disciplines. Fox is President of the Federation of Earth Science Information Partners (ESIP). Fox served as chair of the International Union of Geodesy and Geophysics Union Commission on Data and Information from 2007-2015, is past chair of the AGU Special Focus Group on Earth and Space Science Informatics, and associate editor for the Earth Science Informatics journal, editorial board member for Computers in Geosciences and Nature’s Scientific Data. Fox served on the International Council for Science’s Strategic Coordinating Committee for Information and Data. Fox was awarded the 2012 Martha Maiden Lifetime Achievement Award for service to the Earth Science Information community, and the 2012 European Geosciences Union Ian McHarg Medal for significant contributions to Earth and Space Science Informatics. In 2015, Fox was elected as the first Earth and Space Science Informatics fellow to the American Geophysical Union.

Data Set Analysis


With the spread of high-quality and high-throughput sensors, researchers are swimming in oceans of data. Individual bits of information amassed in datasets around the world chronicle every research topic literally from A to Z, from astronomy to zoology.

Hidden in these datasets are patterns that could help us solve some of the most pressing challenges we face – new routes to renewable energy, clean water, and better health. But there’s a problem: the datasets are too massive and complex for human intelligence alone to process.

I am a data scientist, my work helps researchers extract knowledge from the data that interests them. Standard data analysis techniques are often limited to comparing two variables simultaneously revealing simple relationships, the kind you can plot with a line. We seek more complex relationships, analyzing as many as 10 dimensions simultaneously, finding patterns we weren’t looking for.

One way we do that is with network analysis, a technique social media applications use to make suggestions based on a suite of common traits, like location, age, and interests. Instead of personal traits, we identify communities based on all variables in the datasets. 

Recently, we performed a network analysis of marine fossil records over the past 541 million years, analyzing more than 350,000 ancient groups of one or more populations. In these networks of fossils, we were able to clearly see choke points representing the five known mass extinctions. We quantified the relative ecological impact of those mass extinctions, and explored the consequences of a “sixth mass extinction.”

Network analysis is only one technique we use to navigate researcher’s vast data and information holdings. With each undertaking, we fill in more of the bigger picture.

No Responses

Leave a Reply

Your email address will not be published. Required fields are marked *