【COS编辑部按】 受访人:Jeff Leek

简介:Jeff Leek是约翰·霍普金斯大学布隆博格公共卫生学院(Johns Hopkins Bloomberg School of Public Health)助理教授。他与另外两位教授共同打理的Simply Statistics是最受欢迎的统计学博客之一。本文是小编对Jeff Leek采访的录音稿。

1. 教育背景

我叫Jeff Leek, 我是美国约翰·霍普金斯大学的一名助理教授,方向是生物统计。我在犹他州立大学念的本科,方向是应用数学。然后是在西雅图华盛顿大学读的生物统计博士学位。之后我又在Mount Sinai School of Medicine做了博士后,然后又去约翰·霍普金斯大学做了博士后,方向是计算生物学。我的研究方向主要是基因组相关问题和下一代序列分析。我正在维护一个叫Simply Statistics的博客,里面有很多有趣的统计问题。

2.为什么选择统计专业?

当我还是一个本科生的时候,我和一个教授做一个关于甲壳虫的研究。我收集数据,然后用微分方程模型研究甲壳虫灾害的爆发。那时候在分析数据的过程,我觉得需要学习更多的统计学知识。所以当我申请研究生项目的时候,申了一半的数学,申了一半的统计。但是当我去各个学校访问的时候,觉得还是统计系的人们更有意思点。最后我就去了生物统计专业了。在读研究生的时候,我的博士导师,同是也是我的研究助理导师引导我进入了基因学,而我自己也觉得基因学很酷很令人兴奋。总之,就是我的导师让我觉得基因学很有意思,然后我就进入了这个领域。

3. 你最常用的模型或软件。

R, Python, C.

4. 你最自豪的一项研究成果。

我对我自己的每一项研究都很自豪。其中有一个是我们从很多发表在主流医药学期刊上的学术文章爬数据,收集它们的p值数据,然后估计医药学研究结果中False Positive的比例。我以这个结果很自豪,首先是这是我和我妻子合写的第一篇文章——我妻子也是一位统计学家;其次是我们亲手完成了从收集数据、创建新的统计方法、然后分析数据的整个过程,做一个科学家而不仅仅是统计学家的感觉很爽。

附加问题:

你最喜欢的课程。

我上的诸多课程里最喜欢的是博士生的方法系列课程。因为Jon Wakefile教授很有趣。还有一个是Brian Leroux教授教的函数型数据分析。

我教的课程里我最喜欢的是实用数据分析。因为这门课不仅仅是我一个人讲,我们还有很多数据分析实验。我给学生很多很难的数据,然后让他们自己想出分析的方法。我喜欢这门课是因为课上有很多的互动,可以和学生去讨论问题,就算猜迷一样,而不仅仅是授课。

你认为很多传统的统计方法像线性模型、方差分析之类的还重要吗?

基础当然是很重要的,但是我认为其它的东西也很重要,比如可重复研究领域和学生计算能力——我们现在研究的很多问题都需要大量的计算。还有,课程之外的展示能力、交流能力也很重要。所以有时候,我揣摩着我们是不是可以适当压缩一下那些传统的东西,增加一些新的内容。总之,那些东西是重要的,但是因为现在想做好研究已经越来越难了,学生必须知道很多很多的东西, 所以我们可以做出适当调整。

统计学如何面对来自数据科学的挑战?

我认为吧,对学生来说,这不是挑战,对教授们才是。这对学生们绝对是机遇啊,如果你懂点计算懂点数据可视化,再加上统计学的背景,你绝对是google,微软等公司的抢手货。我就有个学生,本来我希望她去申请教职的,结果一家技术公司看上她,她就去那家技术公司。所以对学生来说这些都是机遇。

对教授们这是挑战,因为更需要想方设法的吸引优秀学生了。应对方法就是统计专业需要增加更多的新内容,例如数据可视化,计算和可重复研究,并且更专注于实际问题。做统计的人,比如像我就会偶尔离开数据去思考一些理论问题。至少对我来说,我所在系是偏重应用的,所以我们需要专注于要研究的问题本身和数据本身。如果我们能做到这些,那就没有什么问题。毕竟人们开始重视数据对大家都有好处。

英文版本:

1. Your education background.

My name is Jeff Leek and I’m currently an assistant professor at Bio-statistics at Johns Hopkins University of the United Stats. Before that I did an undergraduate degree on Applied Math at Utah stat University and then a PhD at Bio-statistics at University of Washington at Seattle and then I did a post doc in Mount Sinai School of Medicine in New York City and another post doc in computation biology at JHU and then become a faculty member. And I do lots of genomic stuff, next generation sequencing analysis. And I also write a blog about simply statistics. (Yes, it’s quite famous!) I don’t know whether it’s famous, if few people read it, that’s good we talk about lots of statistics things.

2. Why you choose statistics?

When I was an undergraduate student I was doing undergraduate research with a professor. I was studying mountain pine beetles, little beetles that eat trees. I was collecting data at that time I was doing differential equation modeling of mountain pine beetle outbreaks. And I realized all the time I was analyzing data. And I need to know more statistics. So When applying for graduate schools, I applied half math program and half statistics programs. And I just like statistics people more than math people who were not as fun as statistics people when I did visits. So I end up with going to Bio-statistics programs and then my adviser there and my very fist adviser in graduate school, the guy who ended up with being my PhD adviser, he was my RA supervisor and introduced me into Genomics. It’s very cool and there is a lot of excitement around and so I got into genomics because he sort of convinced me it’s a cool things to do and it’s being very happy in that area so it’s good.

3. Your favorite course.

In my learning or teaching? (Both)

My favorite course I took –I took a lot of good courses –there was one that I took that was a, sort of, method sequence of PhD. and there was a guy, Jon Wakefield, who is a very funny professor at U of Washington. So I really like that class. And I also took functional data analysis by a guy named Brian Leroux who is a faculty member there too.

And I also like teaching, my favorite class I teach is a hand-on data analysis class where we do lectures we also do in class lab on projects in data analysis. I gave them a lot of data that they had to figure out and they are tricky to figure out. So that is my favorite class I teach because there are more interaction you have got to talk about problems and figure things out feels like a puzzle solving rather than lecturing. So that’s fun.

4. The research you most proud of.

My research are all my most proud of. (Here is one example). We collected a bunch of data of published paper, we are trying to estimate the rate that the medical results are false positive in major medical journals like New England Journal of Medicine. I’m very proud of that result because, first of all, my wife and I work together. This is the first I wrote paper with my wife, she is statisticians too. And I’m proud of also because we do the whole process, we collect the data, build new statistical methods and analyze the data, we did the whole thing, from start to finish. It’s cool to be not just statistician but the scientist, too. That’s a lot of fun.

Additional questions.

Do you think the linear models or traditional statistics still important today?

Yes, I think it’s still very very important to have the basics. but I think for me, what I’m realizing is there are other things that are also really important like reproducible research like being able to compute because a lot of problems we are working on are very very big computational problems . And I also value(???) good presentation and communication skills which are not necessary in the curriculum. So sometimes I think we can compress some of the linear model part and put into these other things that are newer kind of requirement to be a statistics researcher. So I think it’s very important but because it starts to be harder to be a researcher in this area you have to know more stuff. We are going to get the compress done. That is what I think.

How to face the threat from the data science?

I think actually it is not threat for a student, it’s an opportunity. As a student, if you learn a little bit about computing and a little bit about data visualization, and you have your statistics background, you will be in huge demand from companies like Google and Microsoft. In fact one of my PhD students, I was hoping she got a faculty job. She turned it down to go to a tech company. As you see, the tech company recruited her and she decided to go to there. So I think it’s just an opportunity for statistics students. For faculty and other people in academic jobs, it’s more a threat. I say it’s a threat because smart student will choose between data science programs and statistics program. we want them to come to us. So I think the way we can fix that problem is adapting more data visualization, a little more computing, a little more reproducible research program in our program and stay really focus on solving more real and important problems that matters to people. You know statistics, I know I’ve had this problem sometimes where I was sort of leaving away from the data and start to thinking about some really theoretical problems that are very different from what data scientist think about. At least for me we are in a very applied statistics programs, so it’s important fo focus heavily on what the real problem you are trying to solve, what’s the data you have.

I think by doing that we don’t have any problems. It’s better for everyone that the data is very popular.

发表/查看评论