Statistics students make discoveries by analyzing datasets
Lots of data come with lots of questions.
What do the numbers mean and how can the average person understand them?
That’s exactly what three Cornell Summer Research Institute (CSRI) students set out to discover as they dove into three datasets on topics ranging from crime to rivers and archaeological pottery.
“It’s well known at this point that we are getting to a point where we have too much data to analyze,” said Assistant Professor of Statistics Tyler George. “Ten years ago there was a lot of data available, but you often had to pay for it, ask for it, or collect it yourself. Nowadays there’s so much being posted online. We have all this data and much of it isn’t organized or usable to the general public.”
That’s no problem for senior Brian Cochran and juniors Emma Jobe and Ashley Mink. They each selected a dataset to focus on for the eight weeks of CSRI, exploring the data to provide new answers to questions.
Projects
Cochran, a triple major in computer science, mathematics, and data science, teamed up with a Coe College chemistry professor, Marty St. Clair, to analyze the chemical concentrations in water samples St. Clair’s teams of students have been collecting from Iowa watersheds since 2002. He’s building an online dashboard that will be accessible to anyone.
“Some people who might be interested in this dashboard are people working in agriculture because agriculture has a pretty big effect on water quality and the creeks we are studying, people who are working in water treatment, or anyone just interested in learning a little bit more about watersheds,” Cochran said.
Mink, a business analytics and Spanish double major, is working alongside a nonprofit, both trying to better understand the same criminal justice data for the state of Minnesota. She will also create a public dashboard to make historical crime data easily accessible and understandable.
“For example, we are breaking down the crime rates in Hennepin County–where Minneapolis and St. Paul are located–by sex and race. We’re also looking at types of offenses such as those against person and property,” Mink said. “We found that across all of Minnesota, for the most part, property crimes had the largest incarceration rate.”
Jobe, a statistics major, is exploring data for Cornell’s William Deskin Professor of Chemistry Cindy Strong who has a long-running research project to understand the chemical composition of archaeological pottery pieces from 1,000 years ago.
“I think it’s interesting that just using chemistry and statistics you can make conclusions about ancient communities,” Jobe said.
Outcomes
The students met with George each day for an hour or two but did a lot of the research by themselves. They used statistical software, R, which is free and open source. Along with learning Python, statistics and data science majors all graduate with knowledge of how to use these two tools, which they’ll likely need for future jobs within their fields of study.
George says CSRI helps build problem-solving skills and sets them up for success after graduation.
“There’s a lot of research that supports that undergraduate research, undergraduate opportunities like this, are one of the largest components for student success after college,” George said. “There’s nothing better than actually getting to do something very similar to what you might do after graduation. It also helps students decide what their futures will hold.”
Mink, whose dream job is working for the FBI, says she’s learned a lot from her peers and her professor.
“I’d also be interested in doing data science and data research and data entry and things like that,” Mink said. “I thought this would be a really good opportunity to see if that is the path I would like to go down. I’ve come to the conclusion that I think it is.”
This was one of many research projects that unfolded on the Cornell College campus. Fifty-one students and 19 faculty members collaborated on intensive research projects across the liberal arts disciplines during the eight weeks of CSRI.