【COS编辑部按】 受访人：Jeff Leek
简介：Jeff Leek是约翰·霍普金斯大学布隆博格公共卫生学院（Johns Hopkins Bloomberg School of Public Health）助理教授。他与另外两位教授共同打理的Simply Statistics是最受欢迎的统计学博客之一。本文是小编对Jeff Leek采访的录音稿。
我叫Jeff Leek， 我是美国约翰·霍普金斯大学的一名助理教授，方向是生物统计。我在犹他州立大学念的本科，方向是应用数学。然后是在西雅图华盛顿大学读的生物统计博士学位。之后我又在Mount Sinai School of Medicine做了博士后，然后又去约翰·霍普金斯大学做了博士后，方向是计算生物学。我的研究方向主要是基因组相关问题和下一代序列分析。我正在维护一个叫Simply Statistics的博客，里面有很多有趣的统计问题。
R, Python, C.
我上的诸多课程里最喜欢的是博士生的方法系列课程。因为Jon Wakefile教授很有趣。还有一个是Brian Leroux教授教的函数型数据分析。
1. Your education background.
My name is Jeff Leek and I’m currently an assistant professor at Bio-statistics at Johns Hopkins University of the United Stats. Before that I did an undergraduate degree on Applied Math at Utah stat University and then a PhD at Bio-statistics at University of Washington at Seattle and then I did a post doc in Mount Sinai School of Medicine in New York City and another post doc in computation biology at JHU and then become a faculty member. And I do lots of genomic stuff, next generation sequencing analysis. And I also write a blog about simply statistics. (Yes, it’s quite famous!) I don’t know whether it’s famous, if few people read it, that’s good we talk about lots of statistics things.
2. Why you choose statistics?
When I was an undergraduate student I was doing undergraduate research with a professor. I was studying mountain pine beetles, little beetles that eat trees. I was collecting data at that time I was doing differential equation modeling of mountain pine beetle outbreaks. And I realized all the time I was analyzing data. And I need to know more statistics. So When applying for graduate schools, I applied half math program and half statistics programs. And I just like statistics people more than math people who were not as fun as statistics people when I did visits. So I end up with going to Bio-statistics programs and then my adviser there and my very fist adviser in graduate school, the guy who ended up with being my PhD adviser, he was my RA supervisor and introduced me into Genomics. It’s very cool and there is a lot of excitement around and so I got into genomics because he sort of convinced me it’s a cool things to do and it’s being very happy in that area so it’s good.
3. Your favorite course.
In my learning or teaching? (Both)
My favorite course I took –I took a lot of good courses –there was one that I took that was a, sort of, method sequence of PhD. and there was a guy, Jon Wakefield, who is a very funny professor at U of Washington. So I really like that class. And I also took functional data analysis by a guy named Brian Leroux who is a faculty member there too.
And I also like teaching, my favorite class I teach is a hand-on data analysis class where we do lectures we also do in class lab on projects in data analysis. I gave them a lot of data that they had to figure out and they are tricky to figure out. So that is my favorite class I teach because there are more interaction you have got to talk about problems and figure things out feels like a puzzle solving rather than lecturing. So that’s fun.
4. The research you most proud of.
My research are all my most proud of. (Here is one example). We collected a bunch of data of published paper, we are trying to estimate the rate that the medical results are false positive in major medical journals like New England Journal of Medicine. I’m very proud of that result because, first of all, my wife and I work together. This is the first I wrote paper with my wife, she is statisticians too. And I’m proud of also because we do the whole process, we collect the data, build new statistical methods and analyze the data, we did the whole thing, from start to finish. It’s cool to be not just statistician but the scientist, too. That’s a lot of fun.
Do you think the linear models or traditional statistics still important today?
Yes, I think it’s still very very important to have the basics. but I think for me, what I’m realizing is there are other things that are also really important like reproducible research like being able to compute because a lot of problems we are working on are very very big computational problems . And I also value(???) good presentation and communication skills which are not necessary in the curriculum. So sometimes I think we can compress some of the linear model part and put into these other things that are newer kind of requirement to be a statistics researcher. So I think it’s very important but because it starts to be harder to be a researcher in this area you have to know more stuff. We are going to get the compress done. That is what I think.
How to face the threat from the data science?
I think actually it is not threat for a student, it’s an opportunity. As a student, if you learn a little bit about computing and a little bit about data visualization, and you have your statistics background, you will be in huge demand from companies like Google and Microsoft. In fact one of my PhD students, I was hoping she got a faculty job. She turned it down to go to a tech company. As you see, the tech company recruited her and she decided to go to there. So I think it’s just an opportunity for statistics students. For faculty and other people in academic jobs, it’s more a threat. I say it’s a threat because smart student will choose between data science programs and statistics program. we want them to come to us. So I think the way we can fix that problem is adapting more data visualization, a little more computing, a little more reproducible research program in our program and stay really focus on solving more real and important problems that matters to people. You know statistics, I know I’ve had this problem sometimes where I was sort of leaving away from the data and start to thinking about some really theoretical problems that are very different from what data scientist think about. At least for me we are in a very applied statistics programs, so it’s important fo focus heavily on what the real problem you are trying to solve, what’s the data you have.
I think by doing that we don’t have any problems. It’s better for everyone that the data is very popular.