Statistics and Probabilities

  • Most Topular Stories

  • Early May Link roundup

    Probability and statistics blog
    Matt Asher
    3 May 2012 | 5:51 pm
    Naomi Robbins looks at using pie charts to represent women and men in publishing. Her piece is here. The charts in question are here. Warning: Don’t click if you hate pie charts, you just might have a meltdown or wonder why all the info couldn’t be put into a single bar chart. Discussion at Cross Validated about How best to communicate uncertainty with way too few responses. My feeling is that as news filters from scientific realms to pop culture outlets, it becomes more and more “certain” in terms of how it’s presented. Can that be fixed? Master of data…
  • From sketch to graphic

    Social Science Statistics Blog
    12 May 2012 | 10:58 pm
    I just ran across chartsnthings (h/t to Gelman). Kevin Quealy at the New York Times graphics department shows the progression from initial sketch to final graphic. Thoughts: 1) I love seeing other people's first sketches. I sketch first too, and I find that the quality of any graphic can mostly be determined by how good the idea was when I first sketched it. 2) This reminded me that rather than using R to make my final figures, I really need run them through Illustrator. Nathan Yau's book Visualize This gives some awesome worked examples of how to clean up R graphics in Illustrator. (And for…
  • gender-neutral Olympics?!

    Xi'an's Og
    xi'an
    15 May 2012 | 5:12 pm
    As usual, reading the latest issue of Significance is quite pleasant and rewarding (although as usual I have to compete with my wife to get hold of the magazine!). This current issue is dedicated to the (London) Olympics. With articles on predictions of future records, on whether or not the 1988 records can be beaten (the Seoul Olympics were the last games before more severe anti-drug tests were introduced), on advices to Usain Bolt for running faster (!) and on the objective dangers of dying from running a marathon (answer: it is much more “dangerous” to train!). However, a most…
  • Priors on probability measures

    Statisfaction
    Julyan Arbel
    24 Apr 2012 | 1:30 pm
    Hi, for the next GTB meeting at Crest, 3rd May, I will present Peter Orbanz‘ work on Projective limit random probabilities on Polish spaces. It will follow my previous presentation about Bayesian nonparametrics on the Dirichlet process. The article provides a means of constructing any arbitrary prior distribution on the set of probability measures by working on its finite-dimensional marginals. The vanilla example is the Dirichlet process, which is characterized by its Dirichlet distribution marginals on any finite partition of the space (other examples are the Normalized Inverse…
  • Finding Waldo, a flag on the moon and multiple choice tests, with R

    Freakonometrics
    arthur charpentier
    16 May 2012 | 11:58 am
    I have to admit, first, that finding Waldo has been a difficult task. And I did not succeed. Neither could I correctly spot his shirt (because actually, it was what I was looking for). You know, that red-and-white striped shirt. I guess it should have been possible to look for Waldo's face (assuming that his face does not change) but I still have problems with size factor (and resolution issues too). The problem is not that simple. At the http://mlsp2009.conwiz.dk/ conference, a price was offered for writing an algorithm in Matlab. And one can even find Mathematica codes online. But most…
  • add this feed to my.Alltop

    Probability and statistics blog

  • Early May Link roundup

    Matt Asher
    3 May 2012 | 5:51 pm
    Naomi Robbins looks at using pie charts to represent women and men in publishing. Her piece is here. The charts in question are here. Warning: Don’t click if you hate pie charts, you just might have a meltdown or wonder why all the info couldn’t be put into a single bar chart. Discussion at Cross Validated about How best to communicate uncertainty with way too few responses. My feeling is that as news filters from scientific realms to pop culture outlets, it becomes more and more “certain” in terms of how it’s presented. Can that be fixed? Master of data…
  • May Manifesto addendum

    Matt Asher
    2 May 2012 | 9:18 pm
    Just added another statement to my manifesto. Here is the full text: Interpret or predict. Pick one. There is an inescapable tradeoff between models which are easy to interpret and those which make the best predictions. The larger the data set, the higher the dimensions, the more interpretability needs to be sacrificed to optimize prediction quality. This puts modern science at a crossroads, having now exploited all the low hanging fruit of simple models of the natural world. In order to move forward, we will have to put ever more confidence in complex, uninterpretable “black box”…
  • A classification scheme for types of randomness

    Matt Asher
    23 Feb 2012 | 6:26 pm
    We often speak implicitly of different types of randomness but neglect to name or categorize them. Consider this post to be a kind of RFC or rough draft on the division of randomness into five categories. If you start using these distinctions explicitly, even if only in your own head, I think you will find them highly useful, as I have. Type 0: Fixed numbers or known outcomes Type 0 randomness is the special case of randomness where the data are already known. Any known outcome, regardless of the process that generated it, is Type 0 randomness. Once known, it has become a constant. In terms…
  • R A Fisher illustration

    Matt Asher
    15 Jan 2012 | 7:30 pm
    Ronald Aylmer Fisher, statistics badass. Illustration by Rachelle Scarfó for a project I was working on.
  • Explaining large numbers

    Matt Asher
    6 Jan 2012 | 4:05 pm
    It can be very hard to convey the meaning and importance of large numbers. As Joseph Stalin infamously said (or perhaps didn’t): “The death of one man is a tragedy. The death of a million is a statistic.” The point being that we can conceive of one person dying, perhaps our mother or a friend. We can understand it and feel it. However horrific the deaths of a million, the size of the number itself turns it into an abstraction. The video above explores a concept that is abstract to begin with (the national debt) and made even more incomprehensible by having an impossibly…
 
  • add this feed to my.Alltop

    Social Science Statistics Blog

  • From sketch to graphic

    12 May 2012 | 10:58 pm
    I just ran across chartsnthings (h/t to Gelman). Kevin Quealy at the New York Times graphics department shows the progression from initial sketch to final graphic. Thoughts: 1) I love seeing other people's first sketches. I sketch first too, and I find that the quality of any graphic can mostly be determined by how good the idea was when I first sketched it. 2) This reminded me that rather than using R to make my final figures, I really need run them through Illustrator. Nathan Yau's book Visualize This gives some awesome worked examples of how to clean up R graphics in Illustrator. (And for…
  • App Stats: Elwert on "Endogenous Selection"

    23 Apr 2012 | 12:43 pm
    We hope you can join us this Wednesday, April 25, 2012 for the final session of the Applied Statistics Workshop this semester. Felix Elwert, Assistant Professor from the Department of Sociology at the University of Wisconsin-Madison, will give a presentation entitled "Endogenous Selection". A light lunch will be served at 12 pm and the talk will begin at 12.15. "Endogenous Selection" Felix Elwert Department of Sociology, University of Wisconsin-Madison CGIS K354 (1737 Cambridge St.) Wednesday, April 25th, 2012 12.00 pm Abstract: Selection bias is a central problem for causal inference in the…
  • App Stats: Wasow on "Violence and Voting: Did the 1960s Urban Riots Reshape American Politics?"

    16 Apr 2012 | 12:53 am
    We hope you can join us this Wednesday, April 18, 2012 for the Applied Statistics Workshop. Omar Wasow, a Ph.D. candidate from the Department of Government and the Department of African and African American Studies at Harvard University, will give a presentation entitled "Violence and Voting: Did the 1960s Urban Riots Reshape American Politics?" A light lunch will be served at 12 pm and the talk will begin at 12.15. "Violence and Voting: Did the 1960s Urban Riots Reshape American Politics?" Omar Wasow Government Department, Harvard University CGIS K354 (1737 Cambridge St.) Wednesday, April…
  • App Stats: Glynn on "Using Post-Treatment Variables to Establish Upper Bounds on Causal Effects: Assessing Executive Selection Procedures in New Democracies"

    9 Apr 2012 | 11:20 am
    We hope you can join us this Wednesday, April 11, 2012 for the Applied Statistics Workshop. Adam Glynn, Associate Professor from the Department of Government at Harvard University, will give a presentation entitled "Using Post-Treatment Variables to Establish Upper Bounds on Causal Effects: Assessing Executive Selection Procedures in New Democracies". A light lunch will be served at 12 pm and the talk will begin at 12.15. "Using Post-Treatment Variables to Establish Upper Bounds on Causal Effects: Assessing Executive Selection Procedures in New Democracies" Adam Glynn Government Department,…
  • App Stats: Bahar on "International Knowledge Diffusion and the Comparative Advantage of Nations"

    1 Apr 2012 | 11:44 pm
    We hope you can join us this Wednesday, April 4, 2012 for the Applied Statistics Workshop. Dany Bahar, a Ph.D. Candidate in Public Policy at the Harvard Kennedy School, will give a presentation entitled "International Knowledge Diffusion and the Comparative Advantage of Nations". A light lunch will be served at 12 pm and the talk will begin at 12.15. "International Knowledge Diffusion and the Comparative Advantage of Nations" Dany Bahar Harvard Kennedy School CGIS K354 (1737 Cambridge St.) Wednesday, April 4th, 2012 12.00 pm Abstract: In this paper we document that the probability that a…
  • add this feed to my.Alltop

    Xi'an's Og

  • gender-neutral Olympics?!

    xi'an
    15 May 2012 | 5:12 pm
    As usual, reading the latest issue of Significance is quite pleasant and rewarding (although as usual I have to compete with my wife to get hold of the magazine!). This current issue is dedicated to the (London) Olympics. With articles on predictions of future records, on whether or not the 1988 records can be beaten (the Seoul Olympics were the last games before more severe anti-drug tests were introduced), on advices to Usain Bolt for running faster (!) and on the objective dangers of dying from running a marathon (answer: it is much more “dangerous” to train!). However, a most…
  • generalised ratio of uniforms

    xi'an
    14 May 2012 | 5:12 pm
    A recent arXiv posting of the paper “On the Generalized Ratio of Uniforms as a Combination of Transformed Rejection and Extended Inverse of Density Sampling” by Martino, Luengo, and Míguez from Madrid rekindled my interest in this rather peculiar simulation method. The ratio of uniforms samples uniformly on the subgraph to produce simulations from p as the ratio v/u. The proof is straightforward first year calculus but I do not find the method intuitive as, say, accept/reject…. The paper gives a very detailed background on those methods, as well as on the “inverse of…
  • morning light

    xi'an
    14 May 2012 | 7:14 am
    Filed under: pictures, Running Tagged: churches, morning light, Sceaux, sunrise
  • day of the theses

    xi'an
    13 May 2012 | 5:12 pm
    Today, I will spend my day in thesis defenses, as I take part in a defense committee this morning at Supéléc, about a thesis written by Alireza Roodaki on a new approach to trans-dimensional MCMC for mixtures of distributions. Rather than a new way to simulate from posterior distributions with a varying number of components, the thesis concentrates on the post-simulation processing of the outcome of the simulation, constructing an object similar to the point process representation of Matthew Stephens where components have a meaning across varying dimensions. An interesting and novel…
  • the Dewey decimal system

    xi'an
    12 May 2012 | 5:13 pm
    I bought this book in Princeton bookstore mostly because it was a such beautiful object! I had never heard of Nathan Larson nor of the Dewey Decimal System when I grabbed the book and felt the compulsion to buy it! The book published by Akashic Books is indeed a beautiful book: the paper is high quality, a warm crème colour, the cover has inside flaps, the printing makes reading very enjoyable, the pages are cut in such a way that looking at the book from the fore edge makes it look like a Manhattan skyline… Truly a beautiful thing!!! Once I had opened the book, I also got trapped by…
 
  • add this feed to my.Alltop

    Statisfaction

  • Priors on probability measures

    Julyan Arbel
    24 Apr 2012 | 1:30 pm
    Hi, for the next GTB meeting at Crest, 3rd May, I will present Peter Orbanz‘ work on Projective limit random probabilities on Polish spaces. It will follow my previous presentation about Bayesian nonparametrics on the Dirichlet process. The article provides a means of constructing any arbitrary prior distribution on the set of probability measures by working on its finite-dimensional marginals. The vanilla example is the Dirichlet process, which is characterized by its Dirichlet distribution marginals on any finite partition of the space (other examples are the Normalized Inverse…
  • Awesome Bristol

    Pierre Jacob
    24 Apr 2012 | 5:25 am
    Hey, Last week there was a workshop on Confronting Intractability in Statistical Inference, organised by the University of Bristol and the SuSTain group. It was hosted at the Goldney Hall (picture above). It turned out to be a succession of fascinating talks about the recent developments and the future of statistical methods used in very challenging inference problems. What I appreciated above all was the ambition of many talks, and the generosity of the speakers in giving many ideas to the audience. Among the things I’ve learned there, the following were the most ambitious in my…
  • A world without referees

    Julyan Arbel
    10 Apr 2012 | 2:17 pm
      In an invited contribution to the last ISBA Bulletin, Larry Wasserman discusses the  “almost 350 years old” peer review system (paper). Have a look on it, it’s quite thought provoking! We should think about our field like a marketplace of ideas. Everyone should be free to put their ideas out there. There is no need for referees. Good ideas will get recognized, used and cited. Bad ideas will be ignored. This process will be imperfect. But is it really better to have two or three people decide the fate of your work? A world where you put your work on arXiv or on your…
  • Rochebrune Workshop 2012

    Julyan Arbel
    9 Apr 2012 | 4:48 pm
    Hey, Last week I attended Rochebrune workshop for the second time. The genius organizers’ idea (Liliane Bel and Eric Parent from AgroParisTech, Jean-Jacques Borreux from Liège University) is to mix ski, stats and spirits (mostly Genepi and Chartreuse) around a remote alpine chalet on top of Megève ski resort. Most of the attendees are (young) Bayesians working in applied fields, ranging from biology, ecology and epidemiology, to meteorology and climatology. We had great talks about fishes, trees, birds (Joël’s busard cendré), drugs and avalanches. More methodological talks…
  • Meta-analysis of Aid Programs

    Pierre Jacob
    1 Apr 2012 | 5:45 pm
    Hey hey, Here I’m going to advertise a project of Eva Vivalt from the World Bank, who wants to write a book on meta-analysis of aid programs. The project is hosted there: http://www.kickstarter.com/projects/972584134/what-works-in-development-10-meta-analyses-of-aid It’s on kickstarter.com, which means that if (and only if) enough people pledge to give money to the project (here at least 10,000 US$), then Eva Vivalt and her colleagues will write the book and print it and everything. It’s an innovative model, which reminds me of MyMajorCompany. More about the project itself.
  • add this feed to my.Alltop

    Freakonometrics

  • Finding Waldo, a flag on the moon and multiple choice tests, with R

    arthur charpentier
    16 May 2012 | 11:58 am
    I have to admit, first, that finding Waldo has been a difficult task. And I did not succeed. Neither could I correctly spot his shirt (because actually, it was what I was looking for). You know, that red-and-white striped shirt. I guess it should have been possible to look for Waldo's face (assuming that his face does not change) but I still have problems with size factor (and resolution issues too). The problem is not that simple. At the http://mlsp2009.conwiz.dk/ conference, a price was offered for writing an algorithm in Matlab. And one can even find Mathematica codes online. But most…
  • Notes de cours sur les séries temporelles

    arthur charpentier
    15 May 2012 | 10:27 am
    La session d'hiver n'étant pas terminée, je vais poster mes notes de cours sur la dernière section (sur la modélisation de séries temporelles) pour le cours ACT2040. Il s'agit - comme je l'avais dit en cours - d'une remise au goût du jour de notes tapées il y a une dizaine d'années. J'ai également rajouté du code R, mais il doit resté un certain nombre de coquilles et de fautes de frappe. Je profiterais des jours qui viennent pour réviser cette version.
  • Basketball: score dynamics and game theory

    arthur charpentier
    9 May 2012 | 11:22 pm
    Tomorrow morning, I will be giving a talk at Mont Tremblant, for the Journées de la Société Canadienne de Sciences Economiques. I will present a joint work - in progress - with Nathalie Colombier and Romuald Elie. Since the working paper is not online yet, I will wait a little bit before uploading the slides. But they will be online, someday (hopefully soon)... "An important aspect of the strategy of most organizations is the provision of incentives to the employees to meet the organization’s objectives. Typically this implies tying pay to performance (see Prendergast, 1999). In order to…
  • Bayes is playing Russian roulette

    arthur charpentier
    7 May 2012 | 9:58 pm
    There was (once again) a nice puzzle in http://www.futilitycloset.com/. Bayes and a good friend are playing Russian roulette. The revolver has six chambers. He puts two bullets in two adjacent chambers, spin the cylinder, hold the gun to his friend's head, and pull the trigger. It clicks. So it is now Bayes's turn: he can choose either to spin the cylinder again or leave it as it is. Which is better? Hopefully, Bayes knows his theorem: if he does spin it, the probability of getting killed is 2 out of 6 (four empty chambers out of six), but if he does not, since his friend is still alive, then…
  • Correlations, dimension, and risk measure

    arthur charpentier
    4 May 2012 | 1:47 pm
    Yesterday, while I was attending the IFM2 conference, at HEC Montreal, I heard a nice talk about credit risk, and a comparison between contagion (or at least default correlation), for corporate and retail companies (in the US). And it was mentioned that default correlation was much lower for retail companies than it could be for corporate risk. In a discussion that followed those slides, it was mentioned that banks in the US should actually have been working more with those small firms, since contagion risk was much lower. A problem here is that the link between correlation, risk and…
 
  • add this feed to my.Alltop

    R-bloggers

  • Stepping Outside My Open-Source Comfort Zone: A First Look at Golden Helix SVS

    Stephen Turner
    16 May 2012 | 11:29 am
    (This article was first published on Getting Genetics Done, and kindly contributed to R-bloggers) I'm a huge supporter of the Free and Open Source Software movement. I've written more about R than anything else on this blog, all the code I post here is free and open-source, and a while back I invited you to steal this blog under a cc-by-sa license.Every now and then, however, something comes along that just might be worth paying for. As a director of a bioinformatics core with a very small staff, I spend a lot of time balancing costs like software licensing versus personnel/development time,…
  • Population of Tawi-Tawi from 1903 to 2010

    alstated
    16 May 2012 | 3:17 am
    (This article was first published on ALSTAT R Blog, and kindly contributed to R-bloggers) Table 1: Population of Tawi-Tawi from 1903 to 2010 1903 1918 1939 1948 1960 1970 1975 1980 1990 1995 2000 2007 2010 17000 45000 46000 59000 79000 110000 143000 195000 228204 250718 322317 450346 366550 R Codes To leave a comment for the author, please follow the link and comment on his blog: ALSTAT R Blog. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git,…
  • Dynamic Content with RStudio, Markdown, and Marked.

    Christopher Gandrud
    15 May 2012 | 7:17 pm
    (This article was first published on Christopher Gandrud, and kindly contributed to R-bloggers) As Markus Gesmann recently pointed out, the new version of RStudio (0.96) has some really nice features for creating dynamic reports with Yihui Xie’s knitr. You can integrate not just R and LaTeX, but also R and Markdown (as well as some other formats). If you haven’t used Markdown before, it’s basically a really simplified syntax for writing web content, though it can easily be converted not just to HTML but also LaTeX and other formats with Pandoc. See this post by Yihui Xie for…
  • Using R to graph a subject trend in PubMed

    David Ruau
    15 May 2012 | 6:13 pm
    (This article was first published on Brain Chronicle, and kindly contributed to R-bloggers) The traditional way to show that your topic is worth studying in front of an audience is to show the state of the field based on a literature review. This is especially true if your subject is obscure except to a handful of scientists in the world.I was confronted with this problem more than once and the last time I decided to plot the state-of-the-field using a few scripts.I wrote three scripts for that: pubmed_trend.r that take your PubMed query and send it to the NCBI using the Eutils tools (Perl…
  • How long before R overtakes SAS and SPSS?

    David Smith
    15 May 2012 | 5:37 pm
    (This article was first published on Revolutions, and kindly contributed to R-bloggers) Based on an analysis of Google Scholar data on usage of statistical software, Bob Muenchen makes a forecast: R will overtake SAS and SPSS in 2015. Forecasting is extrapolation — always a tricky business — so Bob also provides these qualitative reasons why R will continue to grow at the expense of SAS and SPSS: The continued rapid growth in add-on packages (Figure 10) The attraction of R’s powerful language The near monopoly R has on the latest analytic methods Its free price The freedom to teach…
  • add this feed to my.Alltop

    Statistical Modeling, Causal Inference, and Social Science

  • Wikipedia author confronts Ed Wegman

    Andrew
    16 May 2012 | 8:32 am
    Wegman: “It’s not reprinted 100 percent like you had it.” Wikipedia guy: “No, you added another paragraph at the end and you changed the headline. . . . You even copied the typos that I’ve corrected on my website. It was taken verbatim and reprinted in your paper.” The original author got a check for $500 but, unfortunately, no free subscription to “Wiley Interdisciplinary Reviews: Computational Statistics” (a $1400-$2800 value). P.S. To those who think I’m being mean to Wegman: I haven’t yet heard that he’s apologized to the…
  • Question 5 of my final exam for Design and Analysis of Sample Surveys

    Andrew
    15 May 2012 | 3:00 pm
    5. Which of the following better describes changes in public opinion on most issues? (Choose only one.) (a) Dynamic stability: On any given issue, average opinion remains stable but liberals and conservatives move back and forth in opposite directions (the “accordion model”) (b) Uniform swing: Average opinion on an issue can move but the liberals and conservatives don’t move much relative to each other (the disribution of opinions is a “solid block of wood”) (c) Compensating tradeoffs: When considering multiple survey questions on the same general topic, average opinion can move…
  • A statistical research project: Weeding out the fraudulent citations

    Andrew
    15 May 2012 | 8:26 am
    John Mashey points me to a blog post by Phil Davis on “the emergence of a citation cartel.” Davis tells the story: Cell Transplantation is a medical journal published by the Cognizant Communication Corporation of Putnam Valley, New York. In recent years, its impact factor has been growing rapidly. In 2006, it was 3.482 [I think he means "3.5"---ed.]. In 2010, it had almost doubled to 6.204. When you look at which journals cite Cell Transplantation, two journals stand out noticeably: the Medical Science Monitor, and The Scientific World Journal. According to the JCR, neither of…
  • Question 4 of my final exam for Design and Analysis of Sample Surveys

    Andrew
    14 May 2012 | 4:00 pm
    4. Researchers have found that survey respondents overreport church attendance. Thus, naive estimates from surveys overstate the percentage of Americans who attend church regularly. Does this have a large impact on estimates of time trends in religious attendance? Solution to question 3 From yesterday: 3. We discussed in class the best currently available method for estimating the proportion of military servicemembers who are gay. What is that method? (Recall the problems with the direct approach: there is no simple way to survey servicemembers at random, nor is it likely that they would…
  • I hate to get all Gerd Gigerenzer on you here, but . . .

    Andrew
    14 May 2012 | 8:22 am
    Jonathan Cantor points me to an opinion piece by psychologist Reid Hastie, “Our Gift for Good Stories Blinds Us to the Truth.” I have mixed feelings about Hastie’s article. On one hand I do think his point is important. It’s not new to me, but presumably it’s new to many readers of bloomberg.com. I like Hastie’s book (with Robyn Dawes), Rational Choice in an Uncertain World, and I’m predisposed to like anything new that he writes. On the other hand, there’s something about Hastie’s article that bothered me. It seemed a bit smug, as if he…
  • add this feed to my.Alltop

    The Endeavour

  • Mars, magic squares, and music

    John
    16 May 2012 | 6:59 am
    About a year ago I wrote about Jupiter’s magic square. Then yesterday I was listening to the New Sounds podcast that mentioned a magic square associated with Mars. I hadn’t heard of this, so I looked into and found there were magic squares associated with each of solar system bodies known to antiquity (i.e. Sun, Mercury, Venus, Moon, Mars, Jupiter, and Saturn). Here is the magic square of Mars: The podcast featured Secret Pulse by Zack Browning. From the liner notes: Magic squares provide structure to the music. Structure provides direction to the composer. Direction provides…
  • Machine Learning in Action

    John
    15 May 2012 | 8:23 am
    A couple months ago I briefly reviewed Machine Learning for Hackers by Drew Conway and John Myles White. Today I’m looking at Machine Learning in Action by Peter Harrington and comparing the two books. Both books are about the same size and cover many of the same topics. One difference between the two books is choice of programming language: ML for Hackers uses R for its examples, ML in Action uses Python. ML in Action doesn’t lean heavily on Python libraries. It mostly implements its algorithms from scratch, with a little help from NumPy for linear algebra, but it does not use ML…
  • Criteria for a computing setup

    John
    14 May 2012 | 6:05 am
    “My setup” articles have become common. These articles list the hardware and software someone uses, usually with little explanation. The subtext is often the author’s commitment to the Apple brand or to open source, to spending money on the best stuff or to avoid spending money on principle. I don’t find such articles interesting or useful. Vivek Haldar has written a different kind of  “my setup” article, one that emphasizes the problems he set out to solve and the reasons for the solutions he chose. Here are a couple excerpts describing his goals for…
  • Solutions to knight’s random walk

    John
    10 May 2012 | 7:10 am
    My previous post asked this question: Start a knight at a corner square of an otherwise-empty chessboard. Move the knight at random by choosing uniformly from the legal knight-moves at each step. What is the mean number of moves until the knight returns to the starting square? There is a mathematical solution that is a little arcane, but short and exact. You could also approach the problem using simulation, which is more accessible but not exact. The mathematical solution is to view the problem as a random walk on a graph. The vertices of the graph are the squares of a chess board and the…
  • A knight’s random walk

    John
    8 May 2012 | 7:33 pm
    Here’s a puzzle I ran across today: Start a knight at a corner square of an otherwise-empty chessboard. Move the knight at random by choosing uniformly from the legal knight-moves at each step. What is the mean number of moves until the knight returns to the starting square? There’s a slick mathematical solution that I’ll give later. You could also find the answer via simulation: write a program to carry out a knight random walk and count how many steps it takes. Repeat this many times and average your counts. Related post: A knight’s tour magic square
 
  • add this feed to my.Alltop

    Revolutions

  • Revolution Newsletter: May 2012

    David Smith
    16 May 2012 | 11:38 am
    The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full May edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. New R Training Courses Announced. Three new R courses from leading R experts are now available for registration: An Introduction to R for SAS, SPSS, and Stata Users will be presented by Bob Muenchen (author of R for SAS and SPSS users) June 26-29. This is an on-line workshop…
  • How long before R overtakes SAS and SPSS?

    David Smith
    15 May 2012 | 5:37 pm
    Based on an analysis of Google Scholar data on usage of statistical software, Bob Muenchen makes a forecast: R will overtake SAS and SPSS in 2015. Forecasting is extrapolation — always a tricky business — so Bob also provides these qualitative reasons why R will continue to grow at the expense of SAS and SPSS: The continued rapid growth in add-on packages (Figure 10) The attraction of R’s powerful language The near monopoly R has on the latest analytic methods Its free price The freedom to teach with real-world examples from outside organizations, which is forbidden to academics…
  • Multiple Sclerosis Tweet-Chat: Review

    David Smith
    14 May 2012 | 5:55 pm
    We had a great Twitter conversation last Thursday on the use of big-data analytics, Revolution R Enterprise, and IBM Netezza in the search for a cure for MS. Many thanks to the other panelists: Murali Ramanathan (SUNY Buffalo), Tim Coetzee (National MS Society) and moderator Shawn Dolley (IBM) for fielding and answering questions from interested parties following #IBMDataChat. As you can see from this twitteR analysis, it was a lively discussion, with more than 300 tweets during the designated hour: IBM's James Kobielus has a summary of the chat, highlighting…
  • New courses from R gurus

    David Smith
    14 May 2012 | 5:27 pm
    Looking to learn R, or to expand your R skills for data visualization or package development? Here are some R courses presented by the experts you may be interested in: June 19-20: Visualization in R with ggplot2. This course presented by Garrett Grolemund & Dr. Winston Chang of Rice University is also a web-based course with live presentation. This course provides instruction on data visualization with R, including data transformation, visualization of Big Data and polishing graphics for presentation.   June 21-22 (in New York City) and June 28-29 (in Redwood City,…
  • Because it's Friday: Australian PSAs from the 80s

    David Smith
    11 May 2012 | 3:03 pm
    When I was a kid growing up in Australia, it seemed like every commercial break during the Saturday morning cartoon's or after-school shows was punctuated by some PSA encouraging us to lead a healthier life. These "community service announcements" were government-sponsored, and often paired a low-budget animations with a catchy jingle. Strangely enough, lots of Australians (me included) remember them fondly, and can still recite the songs on demand. Here are a few of my favourites: "Slip Slop Slap" made avoiding skin cancer fun (an important lesson in the Sunburnt…
  • add this feed to my.Alltop

    Numbers Rule Your World

  • Facebook's challenges ahead as it goes public

    junkcharts
    15 May 2012 | 10:25 pm
    Given that Facebook is on the verge of going public -- the IPO supposedly over-subscribed, leading to an increase in the projected price of the offering, this article from Reuters is big news: GM, the third largest advertiser, will not spend anymore money on Facebook ads. This should rightfully get Facebook's attention. We have previously discussed the incredibly low clickthrough rates of Facebook ads, and speculated as to why consumers don't respond to these ads. (link) As marketers start to ask questions, Facebook is going to have to give some answers. Here is a starter's list:…
  • What is the (expensive) miracle drug that cut colon cancer deaths by 20%?

    junkcharts
    10 May 2012 | 10:38 pm
    Andrew Gelman linked to this great reporting by Reuters on U.S. healthcare economics. It's a must-read. Be patient, and read through to the end even though it's a long piece. Andrew cites statistician Don Berry who explains what "lead time bias" is, and why survival time is always the wrong metric to use in evaluating health outcomes. Survival time is the time from diagnosis to death. By doing more screening and diagnosing earlier, survival time will magically increase even if the patient's life expectancy stays put. I ignored Andrew's warning and spent some time…
  • Dropping out dropouts is a mischievous act

    junkcharts
    6 May 2012 | 10:34 pm
    The following throw-away lines in a Wall Street Journal article about the "return on investment" of getting into college debt (what an idea) are the most important ones: The report [by the College Board] also doesn't account for dropouts or extra college years. Only 56% of students who enroll in a four-year college earn a bachelor's degree within six years, according to a report last year by the Harvard Graduate School of Education... PayScale, a Seattle data firm, examines the links between pay and variables like colleges and majors. Its analysis, which also ignores…
  • Choosing reference levels

    junkcharts
    28 Apr 2012 | 10:28 am
    Reader Jordan G. submitted this infographics (via Business Insider) to Junk Charts, my other blog about graphical presentation of data. Graphically, the chart has nothing to commend itself but the most annoying failure is the awful choice of statistics. Stating the Trivial Regarding Pandora, the chart tells us 18.7 million hours of streamed music per day (across many millions of computers) is equal to one computer streaming music for more than 2011 years. What the chart is really saying is that there are more than 18.7 million hours in 2011 years. (If you do the math, there are 17.6 million…
  • add this feed to my.Alltop

    The Numbers Guy

  • The Waiting Game

    Carl Bialik
    4 May 2012 | 9:34 pm
    Long lines at airport immigration halls are objects of frustration for travelers, and of study for queuing experts who offer ideas for easing the pain.
  • The Fire Countdown Clock

    Carl Bialik
    20 Apr 2012 | 9:01 pm
    How long does it take for fire departments to get on the scene at a fire emergency? The question is surprisingly difficult to answer, obscured by complexities in definitions and measurement.
  • Cruise Safety, a Century After Titanic

    Carl Bialik
    13 Apr 2012 | 7:09 pm
    Without comprehensive safety statistics for today's cruise ships, it is difficult to assess the industry's claim that its ships are safer than most other means of transportation.
  • Modern Thermometers

    Carl Bialik
    6 Apr 2012 | 8:39 pm
    Temperature alone can't say how warm people will feel outside. Scientists are taking on the challenge of formulating a new way to measure climate comfort.
  • Imagining a Census Survey Without a Mandate

    Carl Bialik
    30 Mar 2012 | 9:07 pm
    It is mandatory for recipients of the Census Bureau's American Community Survey to respond to it. A House Bill would change that. How would that affect the crucial data produced by the survey?
 
  • add this feed to my.Alltop

    Observational Epidemiology

  • "Implicit Association Tests"

    16 May 2012 | 2:56 am
    I hadn't heard of this technique before but this interview got my attention. Here's the Wiki version:A typical IAT procedure involves a series of seven tasks.6 In the first task, an individual is asked to categorize stimuli into two categories. For example, a person might be presented with a computer screen on which the word "Black" appears in the top left-hand corner and the word "White" appears in the top right-hand corner. In the middle of the screen a word, such as a first name, that is typically associated with either the categories of "Black" or "White." For each word that appears in…
  • Ddulite Alert I (II to follow shortly)

    15 May 2012 | 2:13 am
    NPR's Steve Henn  has an excellent story here about Facebook and Yahoo. Lots of good points but a couple struck me as particularly relevant to the ongoing ddulite thread.Today, Facebook CEO Mark Zuckerberg turns 28 and gets the ultimate birthday gift: His popular social networking site is expected to go public later this week. The IPO could be valued at nearly $100 billion. Meanwhile, Yahoo, another company that also once had a bright future, continues to undergo upheaval as it struggles to define its mission.Facebook is expected to start selling stock to the public and begin…
  • A good point on J.P. Morgan

    12 May 2012 | 10:53 pm
    (If you're still getting caught up on the JP Morgan story, you should probably go by Marketplace and check out Heidi Moore's explanation of the fiasco.) I don't have a direct source for this other than that I heard it on either Marketplace or All Things Considered, but a financial reporter made an observation I've been waiting to hear put concisely since this story broke.The reporter explained that the group that had the huge recent loss had been given a dual mandate: hedge against losses and make lots of money. The reporter then wondered if assigning those two mandates to the same team was a…
  • For the weekend -- five words with something in common

    12 May 2012 | 12:38 am
    ProjectRecordProduceConductProgressUPDATE: I noticed no one had a guess yet so I'll put some clues in the comment section.
  • "Ar go"

    10 May 2012 | 2:44 am
    Unlike NPR's Planet Money, which started out with one of the best debuts in in recent journalism then faded rapidly, American Public Media's Marketplace has managed to maintain its exceptional quality for better than two decades.Here are a couple of examples from today's show (I'll blog about them later if I get a chance):A beautifully done story on a program to help teen mothers in Cincinnati (*and the source of the title of this post). **And a good account of the pros and the cons of the coupon business.This show is definitely worth setting aside a half hour of you afternoon.** While…
  • add this feed to my.Alltop

    FiveThirtyEight

  • A 30,000-Foot View on the Presidential Race

    By NATE SILVER
    15 May 2012 | 7:55 am
    Nothing fundamental has changed in the race in the past month or two, although there are a couple of factors that may be working in Mitt Romney's favor at the margin.
  • Gay Marriage and the Democratic Base

    By NATE SILVER
    11 May 2012 | 7:21 am
    One clear rationale for President Obama's new position? It's very unusual for a president to oppose the majority of voters in his party on a major issue, and 60 percent of Democrats support gay marriage.
  • Support for Gay Marriage Outweighs Opposition in Polls

    By NATE SILVER
    9 May 2012 | 3:52 pm
    President Obama's decision to endorse same-sex marriage undoubtedly entails some political risk, but recent polls suggest that public opinion is increasingly on his side.
  • Moderate Republicans Fall Away in the Senate

    By NATE SILVER
    8 May 2012 | 8:57 pm
    Most moderate Republicans who served in the Senate just a few years ago will no longer be in the Congress when it meets again 2013.
  • Is Obama More Popular Than He Should Be, Revisited

    By JOHN SIDES
    8 May 2012 | 4:08 pm
    After discussion from FiveThirtyEight readers, looking back and extending an analysis that noted President Obama's approval rating exceeded a theoretical approval rating predicted by history, the economy and other factors.
 
  • add this feed to my.Alltop

    Realizations in Biostatistics

  • Thoughts on privacy

    15 May 2012 | 7:45 am
    As this world gets more connected, and as data storage and analysis advances, we have to change our notions of privacy and data stewardship. About 25 years ago, right before email hit the big time and data analysis methods were limited to small datasets or Cray supercomputers, having data was a huge deal. Coverups, such as Watergate, were characterized by hiding data from others. While still true, it’s a lot harder, and, with increases in computing speed and availability of data, it’s a lot harder to hide from the rest of the world. Whether we like it or not, our notions of privacy have…
  • Statistical leadership part III–shameless plug for PharmaSUG talk

    30 Apr 2012 | 7:45 am
    PharmaSUG is a yearly gathering of SAS programmers who program for the pharmaceutical industry. This year, Dr. Katherine Troyer of REGISTRAT-MAPI will be giving a talk entitled “Giving Data a Voice: Partnering with Medical Writing for Best Reporting Practices,” in which she will implore the audience to get statisticians, medical writers, SAS programmers, clinicians, data managers, and any other stakeholder together early and often in the clinical trial process. While it may seem like the medical writer may only need to come into the process late, they actually have to put everything…
  • Coursera (and other online classes)

    23 Apr 2012 | 7:45 am
    A revolution is taking place in education. Last fall, Stanford University premiered three online classes in Artificial Intelligence, Machine Learning, and Introduction to Databases. I took Machine Learning and Intro to Databases, and this spring I’m taking Probabilistic Graphical Models, Natural Language Processing, and Model Thinking. This winter and spring, that effort has evolved into Coursera, and the course offering has expanded to about 30 courses across disciplines and difficulties. Other universities, such as the University of Michigan, UPenn, and Princeton have gotten in on the…
  • Using R for a salary negotiation–an extension of decision tree models

    21 Mar 2012 | 7:45 am
    Let’s say you are in the middle of a salary negotiation, and you want to know whether you should be aggressive in your offering or conservative. One way to help with the decision is to make a decision tree. We’ll work with the following assumptions: You are at a job currently making $50k You have the choices between asking $60k (which will be accepted with probability 0.8) or $70k (which will be accepted with probability 0.2). You get one shot. If your asking price is rejected, you stay at your current job and continue to make $50k. (This is one of those simplifying assumptions that we…
  • Why I hate p-values (statistical leadership, Part II)

    5 Mar 2012 | 7:45 am
    One statistical tool is the ubiquitous p-value. If it’s less than 0.05, your hypothesis must be true, right? Think again. Ok, so I don’t hate p-values, but I do hate the way that we abuse them. And here’s where we need statistical leadership to go back and critique these p-values before we get too excited. P-values can make or break venture capital deals, product approval for drugs, or senior management approval for a new design of deck lid. In that way, we place a little too much trust in them. Here’s where we abuse them: The magical 0.05: if we get a 0.51, we lose, and if we get a…
  • add this feed to my.Alltop

    Blog about Stats

  • HTML5: Flow map of internal migration in England & Wales

    graphboy
    11 May 2012 | 7:17 am
    At the start of this year, I wrote a brief post on how we’ve been looking closely at the emergence of HTML5 as a visualisation platform. This week, we’ve published the first fruit of those labours – an interactive flow map of internal migration data for England & Wales: The aim here was to really stretch the technology and give it a tough visualisation challenge to see what it’s capable of. The underlying data used in the map contains over 60,000 migration flows – yet we have still managed to produce an interactive application that will run on an iPhone. I…
  • Knickgraph … ?

    Armin Grossenbacher
    4 May 2012 | 2:52 pm
    The Swiss Federal Statistical Office has been publishing data visualizations for more than 100 years. Its head of graphic design, Daniel von Burg, reveals some curiosities. Used for the first time in the 1897 Atlas, the Knickgraph optimizes the surface of a bar graph. Its length is proportional to the value that is being represented. It provides an elegant solution to a problem that’s often encountered in data visualization: how to include a value that is much greater that the others in a graph, without completely attenuating the visual impact of the smaller values. Swiss land area and…
  • A Statistician among the 100 Most Influential People in the World. Who is it?

    Armin Grossenbacher
    22 Apr 2012 | 9:46 am
    Congratulations to You Hans! Das ist mehr als verdient für die gewaltige Leistung, mit Statistiken das Bewusstsein für die grossen Probleme zu schaffen und zur Veränderung der Welt beizutragen. And here is his shortest talk – with his Stone Age-Gapminder tool.
  • The True Size of Africa

    visuell
    16 Apr 2012 | 5:01 pm
    If there still had to be further proof needed that a picture may be worth more than a thousand words, here it is: In cartography there has been a lot of attempts to educate about the true size of Africa, the Peters Projection is among the more commonly known. However that Peters map has alienated people as it is too far off of traditional viewing habits, dismisses the Mercator map for all the wrong reasons and – as this sentence shows – slips into map geekery all too easily. So there is little left to say apart from the fact that I wish this work of statistical art a wide audience,…
  • Storage needs?

    Armin Grossenbacher
    14 Apr 2012 | 9:38 am
    Where to put my data? Go through the labyrinth and find the answer.
  • add this feed to my.Alltop

    R-statistics blog

  • data.table version 1.8.1 – now allowed numeric columns and big-number (via bit64) in keys!

    Tal Galili
    9 May 2012 | 1:38 am
    This is a guest post written by Branson Owen, an enthusiastic R and data.table user. Wow, a long time desired feature of data.table finally came true in version 1.8.1! data.table now allowed numeric columns and big number (via bit64) in keys! This is quite a big thing to me and I believe to many other R users too. Now I can hardly think any weakiness of data.table. Oh, did I mention it also started to support character column in the keys (rather than coerce to factor)? For people who are not familiar with but interested in data.table package, data.table is an enhanced data.frame for…
  • Speed up your R code using a just-in-time (JIT) compiler

    Tal Galili
    10 Apr 2012 | 6:34 pm
    This post is about speeding up your R code using the JIT (just in time) compilation capabilities offered by the new (well, now a year old) {compiler} package. Specifically, dealing with the practical difference between enableJIT and the cmpfun functions. If you do not want to read much, you can just skip to the example part. As always, I welcome any comments to this post, and hope to update it when future JIT solutions will come along. Prelude: what is JIT Just-in-time compilation (JIT): is a method to improve the runtime performance of computer programs. Historically, computer programs had…
  • Do more with dates and times in R with lubridate 1.1.0

    Tal Galili
    16 Mar 2012 | 4:13 am
    This is a guest post by Garrett Grolemund (mentored by Hadley Wickham) Lubridate is an R package that makes it easier to work with dates and times. The newest release of lubridate (v 1.1.0) comes with even more tools and some significant changes over past versions. Below is a concise tour of some of the things lubridate can do for you. At the end of this post, I list some of the differences between lubridate (v 0.2.4) and lubridate (v 1.1.0). If you are an old hand at lubridate, please read this section to avoid surprises! Lubridate was created by Garrett Grolemund and Hadley Wickham. Parsing…
  • Printing nested tables in R – bridging between the {reshape} and {tables} packages

    Tal Galili
    29 Jan 2012 | 4:41 pm
    This post shows how to print a prettier nested pivot table, created using the {reshape} package (similar to what you would get with Microsoft Excel), so you could print it either in the R terminal or as a LaTeX table. This task is done by bridging between the cast_df object produced by the {reshape} package, and the tabular function introduced by the new {tables} package. Here is an example of the type of output we wish to produce in the R terminal: 1 2 3 4 5 6 7 ozone solar.r wind temp month mean sd mean sd mean sd mean sd 5 23.62 22.22 181.3 115.08 11.623 3.531 65.55 6.855 6 29.44 18.21…
  • Interactive Graphics with the iplots Package (from “R in Action”)

    Tal Galili
    24 Jan 2012 | 6:29 am
    The followings introductory post is intended for new users of R.  It deals with interactive visualization using R through the iplots package. This is a guest article by Dr. Robert I. Kabacoff, the founder of (one of) the first online R tutorials websites: Quick-R. Kabacoff has recently published the book ”R in Action“, providing a detailed walk-through for the R language based on various examples for illustrating R’s features (data manipulation, statistical methods, graphics, and so on…). In previous guest posts by Kabacoff we introduced data.frame objects in R and…
 
  • add this feed to my.Alltop

    Politics » Polls

  • Views on Two Romney Policy Proposals Underscore the Candidates’ Challenges

    Damla Ergun
    16 May 2012 | 6:00 am
    Two of Mitt Romney’s key campaign proposals fall short of majority approval, with swing-voting independents especially cool on his plan to repeal health care reform and evenly divided on his offer of a hefty tax cut. Trimming taxes does better overall. Among all Americans, 48 percent express a favorable opinion of Romney’s proposal to reduce federal tax rates by 20 percent, while 39 percent see it unfavorably. His call to repeal the Obama health care law, for its part, gets a 40-40 split. See PDF with full results, charts and tables here. Neither proposal earns majority support in…
  • Obama and Gay Marriage: Opinions Divide, and Sharply

    Damla Ergun
    15 May 2012 | 6:00 am
    Americans divide essentially evenly in their responses to President Obama’s new position on gay marriage, with views more strongly negative than positive and stark divisions across political, ideological and other groups – including a broad gender gap. All told, 46 percent in this ABC News/Washington Post poll express a favorable impression of Obama’s statement in an interview with ABC’s Robin Roberts last week that he personally has come to support gay marriage, while 47 percent respond unfavorably. That includes a 10-point tilt toward “strongly” negative…
  • Mixed Views on Three Key Issues Mark Obama’s Campaign Challenges

    Greg Holyk
    11 May 2012 | 6:00 am
    Americans give Barack Obama mixed marks on three prominent issues he’s touted in his bid for re-election, with no scores above 50 percent on the auto industry bailout, greater regulation of financial institutions or – most basic – the administration’s economic stimulus program. Middling ratings on each of these suggest some of the president’s challenges in the campaign, now officially under way. While he’s substantially more popular personally than the presumptive Republican nominee, Mitt Romney, Obama is vulnerable on key issues. See PDF with full results,…
  • Baseball Leads, B-ball’s Runner-up, While Hockey and NASCAR Lag

    Greg Holyk
    2 May 2012 | 6:00 am
    America’s favorite pastime hits a home run in public popularity, and basketball’s bouncing along nicely. But in the rink and on the raceway, hockey and NASCAR could use a little better buzz. Two-thirds of Americans express a favorable opinion of professional baseball, vs. just 28 percent who see it negatively in this ABC News/Washington Post poll. That gives it the top spot among the four currently in-season professional sports tested. See PDF with full results, charts and tables here Professional basketball’s not far behind, rated favorably by 58 percent, unfavorably by 37…
  • Michelle Obama, Ann Romney, Hillary Clinton: In Personal Popularity, the Women Rule

    Gary Langer
    24 Apr 2012 | 11:01 pm
    Michelle Obama and Ann Romney outscore their husbands in personal popularity in the latest ABC News/Washington Post poll, while Hillary Clinton, for her part, has hit a new high in favorability data stretching back to her entry on the national stage 20 years ago. Clinton and Obama both are far better known than Romney, helping boost them to much higher popularity ratings overall. All three are rated unfavorably by roughly similar numbers, 24 percent for Obama, 27 percent for Clinton and 30 percent for Romney. All told, Obama is seen favorably by 69 percent of the public, unfavorably by 24…
  • add this feed to my.Alltop

    Lies, damned lies and statistics

  • The importance of the ‘visual web’ – some stats

    Dirk Singer
    8 May 2012 | 5:19 pm
      Recently, Read Write Web’s Richard McManus penned a series of articles around ‘The Visual Web’, “meaning that images and video are becoming an increasingly important part of what we consume online.” I couldn’t agree more.  Speaking personally, Instagram started replacing Twitter as my social network of choice about a year ago.    More to the point, there is now a wealth of statistics to underpin what Richard is talking about.  This is specially when you consider that many of the biggest social media stories of the past year – Tumblr,…
  • Stats that show why you need a mobile first approach now

    Dirk Singer
    2 May 2012 | 3:54 pm
      The other week someone asked me what an agency such as Rabbit (where I work) should be focusing on going forward. My reply, was previously organisations have been talking about a digital first approach.   In other words, concentrate on online channels and content 1st, which then filters into the offline world.   However now the priority has to be mobile first,  you cater in the 1st instance for people who may be consuming your content via their smartphones. Sounds obvious enough, but some recent stats demonstrate the need to move from theory to practice: In February 2012,…
  • If Instagram will be like YouTube, which photo-network will be like Vimeo? Tadaa is one possibility

    Dirk Singer
    29 Apr 2012 | 2:57 pm
    In the wake of Facebook’s $1billion Instagram buy earlier in the month, one comparison that did the rounds is that Instagram will be to Facebook like YouTube is to Google.  A stand-alone (ish) network, that is the clear leader in what it does (so for YouTube online video, for Instagram mobile photography). However, while YouTube may be the 800lb gorilla of video sharing, a few networks have managed to carve out their own roles – on positioning as opposed to numbers.   Vimeo for example, is a fraction the size of YouTube, but a lot of creative industry professionals prefer it…
  • How technology has changed childhood – ten stats

    Dirk Singer
    26 Apr 2012 | 3:30 pm
      Over the past eighteen months Internet security company AVG (disclosure – Rabbit client) has been carrying out research to see how technology has changed childhood, beyond recognition from someone who grew up twenty or thirty years ago. With five waves looking at kids from birth across eleven countries, the end result is a fairly extensive piece of research.   Ten key stats are as follows: 1 – 81% of children under two currently have some kind of digital dossier or footprint, with images of them posted online. In the US that rises to 92% 2 – Though the average…
  • Less than half of European airports have customer focused social channels

    Dirk Singer
    26 Apr 2012 | 11:34 am
    Anyone who knows a bit about what we do at Rabbit (where I work), will know that a lot of it is in the aviation sector.   As a result, I was interested to see this infographic that Shashank Nigham of SimpliFlying shared the other day. Accompanying a report by the Airports Council International (the airport trade body ), it shows that 77% of travellers pass through European airports that are active on social media.   So far so good, but actually once you delve into the report a little bit further you find that European airports are less progressive than you might first think. First of all,…
  • add this feed to my.Alltop

    Empirical Legal Studies

  • Baldy Fellowships

    Michael Heise
    15 May 2012 | 11:03 am
    The Baldy Center for Law & Social Policy at SUNY-Buffalo plans to award several post-doc, mid-career, and senior fellowships for the 2012-13 academic year. The Fellowships are geared toward "scholars pursuing important topics in law, legal institutions, and social policy." The Baldy Center invites applications from an array of disciplines, including "law, the humanities, and the social sciences." What I found particularly notable (and attractive) is that "Fellows are expected to participate regularly in Baldy Center events, but otherwise have no obligations beyond…
  • Defunding Political Science at NSF

    Christopher Zorn
    10 May 2012 | 1:23 pm
    Late last night, on a nearly party-line 218-208 vote, the U.S. House passed an amendment (by Rep. Flake, R-AZ) to HR 5326 to "prohibit the use of funds to be used to carry out the functions of the Political Science Program in the Division of Social and Economic Sciences of the Directorate for Social, Behavioral, and Economic Sciences of the National Science Foundation."  The Monkey Cage has some of the relevant links.  Efforts like this have been mounted before -- most recently in 2009, by Sen. Tom Coburn -- but none have gotten this far. The actual debate on the defunding…
  • Call for Papers: AI and the Law

    Christopher Zorn
    8 May 2012 | 1:17 pm
    Robert Richards, proprietor of the Legal Informantics Blog, points out a call for papers for a special issue of Artificial Intelligence and Law focused on "modeling policy making."  Of particular note for ELS folks is that the special issue welcomes submissions on "the first three phases of the policy cycle: agenda setting, policy analysis, and lawmaking."
  • Stata Users' Favorite Commands

    Michael Heise
    3 May 2012 | 4:05 pm
    The folks over at The Stata Blog recently polled readers (obviously, a non-random selection of Stata users) on their favorite Stata command. While some may find the results (here) themselves interesting, others might find unfamiliar commands that could prove useful.
  • A History of "p" Levels

    Michael Heise
    25 Apr 2012 | 4:40 pm
    While most ELS Blog readers understand traditional significance ("p") levels, few understand how (or why) "0.05" emerged as the standard for statistical significance. In The Adoption of Significance Tests by the Scientific Community: An Empirical Analysis, David A. Gully (Columbia--Engineering) discusses the adoption of the 0.05 standard. A excerpted abstract follows: "This paper adds to the literature by determining the timing and level of acceptance of common tests of statistical inference. Using the archives of the Royal Society, we examined 574 research studies…
 
  • add this feed to my.Alltop

    Deviant Square Stats Tutorials

  • Please Vote on the "Top Confusing Stats Terms"

    Jeremy Taylor
    23 Apr 2012 | 7:01 am
    Please Vote on the "Top Confusing Stats Terms", and the results will dictate which terms I will explain in upcoming blog entries! http://www.statsmakemecry.com/confusing-stats-terms/
  • How to Conduct a Repeated Measures MANCOVA in SPSS

    Jeremy Taylor
    20 Apr 2012 | 1:35 pm
    In today's blog entry, I will walk through the basics of conducting a repeated-measures MANCOVA in SPSS. I will focus on the most basic steps of conducting this analysis (I will not address some complex side issues, such as assumptions, power…etc). If you find yourself with lingering questions after walking through this blog, feel free to leave questions in the "comments" section, or visit the MANCOVA section of my discussion forum to find answers and/or ask questions of your own. Full disclosure: the example data used is from the SPSS sample/help files, and it can be downloaded below.
  • The Worst Mistake Made on a Dissertation Is...

    Jeremy Taylor
    6 Sep 2011 | 12:49 pm
    I have a saying that I like to tell consulting clients, which is easier said than done, but I think are words for doctoral candidates to live by: "The only bad dissertation draft is one that isn't turned-in." The most common factor that unnecessarily slows progress on a dissertation proposal or defense is a propensity to strive for the perfect draft. As a graduate student, we all fantasized of turning-in our first draft and having our advisor, being so amazed at its brilliance, insist that you accept your PhD on the spot. Unfortunately, reality inevitably sets-in in the form of…
  • Moderating Effects with Seemingly Uncorrelated Variables

    Jeremy Taylor
    13 Jul 2011 | 7:00 am
    I received a great question this week, as a submission to my Ask the Stats Make Me Cry Guy page, which asked: In order for a moderating relationship to exist, do the predictor IV and dependent variable need to be significantly correlated?". This is a question that I am asked a lot, partly because of the common confusion between mediators and moderators and the commonly held belief that an IV and DV should be related for mediation to be present (see my video blog on Mediators, Moderators, and Supressors for more info on this topic). However, moderators are a completely different story. In…
  • Using Syntax to Assign 'Variable Labels' and 'Value Labels' in SPSS

    Jeremy Taylor
    20 Jun 2011 | 8:09 am
    Preparing a dataset for analysis is an arduous process. Besides recoding and cleaning variables, a diligent data analyst also must assign variable labels and value labels, unless they choose to wait until after your output is exported to Microsoft Word. Unfortunately, that option only leaves additional opportunity for error and confusion, not to mention the inefficiency of editing tables in Microsoft Word. Who among us have not been frustrated while wrestling with Microsoft Word? When used in conjunction with the customizable SPSS table "Looks" function, formatting your variable labels and…
  • add this feed to my.Alltop

    CoolData blog

  • Emerson’s big data

    kevinmacdonell
    8 May 2012 | 10:35 am
    On day in late March I got on a plane from Toronto (where I attended Annual Fund benchmarking meetings hosted by Target Analytics) to Las Vegas (for the Sungard Higher Education Summit), and picked up the Toronto Globe & Mail. I scanned a section that offered some ephemera, including the startling news that my fellow countryman [...]
  • For agile data mining, start with the basics

    kevinmacdonell
    26 Apr 2012 | 7:56 am
    Lately I’ve been telling people that one of the big hurdles to implementing predictive analytics in higher education advancement is the “project mentality.” We too often think of each data mining initiative as a project, something with a beginning and end. We’d be far better off to think in terms of “process” — something iterative, [...]
  • Data I want to play with

    kevinmacdonell
    24 Apr 2012 | 4:23 am
    Guest post by Marianne M. Pelletier, Director of Advancement Research and Data Support, Cornell University In my present job, I deal with a whole lot of data – over 2,000 fields of data on gifts, names, addresses, relationships, segmenting codes, dates, attributes, interests, contacts, you name it. Yet getting to play in this playground as a [...]
  • Stepwise, model-foolish?

    kevinmacdonell
    18 Apr 2012 | 7:00 am
    My approach to building predictive models using multiple linear regression might seem plodding to some. I add predictor variables to the regression one by one, instead of using stepwise methods. Even though the number of predictor variables I use has greatly increased, and the time needed to build a model has lengthened, I am even less [...]
  • Are we missing too many alumni with web surveys?

    kevinmacdonell
    28 Mar 2012 | 7:04 am
    Guest post by Peter B. Wylie and John Sammis (Download a printer-friendly PDF version here: Web Surveys Wylie-Sammis) With the advent of the internet and its exponential growth over the last decade and a half, web surveys have gained a strong foothold in society in general, and in higher education advancement in particular. We’re not experts [...]
  • add this feed to my.Alltop

    The LoveStats Blog

  • 5, 7, and 9 Point Scales: Do You See The Difference? #MRX

    LoveStats
    16 May 2012 | 9:45 am
    It’s a highly debated question with quantitative data to support all sides. Are 5 point, 7 point, or 9 point scales better suited for generating quality data? Sure, the distribution of responses is slightly different in each case and your ability to conduct more complex statistical analyses can be improved. But I have a few very basic arguments all of which lead me to support scales with  fewer items. Scales with more points create differences where differences do not exist. Sure, I understand. You want to measure tiny differences. But do consumers REALLY see the difference between 5…
  • The Failure of the Story Paradigm #MRX

    LoveStats
    14 May 2012 | 9:01 am
    People love storytelling. Once upon a time is a great way to learn to be nice to other children, that selfishness isn’t the best way to run your life, that if you kiss a frog you might get a prince. But in the market research space, this paradigm  needs to die a quick death. Sure, telling a research story is heartwarming and gives you goosebumps. To travel in the day of a life of a single person gives you a more meaningful understanding of a brand. But the Story Telling Paradigm is a gross misdirection. It’s like leading someone through a puzzle with blinders on and only showing…
  • Dear Unfit Parent, I Have Your Licence Plate Number

    LoveStats
    11 May 2012 | 6:32 pm
    I live in one of the nicest areas of my city. It’s reasonably well to do and people have everything they need, nice houses with front and back yards, multiple cars, pretty parks, boutique shopping, quaint restaurants. So you have to think people in the area have their heads screwed on pretty well. Today, I waited with arms full of groceries at a stranger’s car for at least 5 minutes. I asked any one who walked by if it was their car, if they knew who owned the car, but to no avail. I copied down the licence plate while I waited. Finally, when the owner returned, I yelled at him…
  • Talk doesn’t cook rice #MRX

    LoveStats
    11 May 2012 | 9:54 am
    Welcome to Really Simple Surveys (RSS), the younger sibling of Really Simple Statistics. There are lots of places online where you can ponder over the minute details of complicated survey designs but very few places that make survey design quickly understandable to everyone. I won’t explain exceptions to the rule or special cases here. Let’s just get comfortable with the fundamentals. I have always thought the actions of men the best interpreters of their thoughts. – John Locke Never mistake motion for action. – Ernest Hemingway Well done is better than well said.
  • Best of #Esomar Canada: Adam Froman Goes To The Opera #MRX

    LoveStats
    9 May 2012 | 7:14 pm
    This is one of several (almost) live blogs from the ESOMAR Best of Canada event on May 9, 2012. Any errors, omissions, and ridiculous side comments are my own. ————————————————————————————– Adam Froman, Delvinia listening to Finn Rabin, DG of @ESOMAR opening the Toronto Best of 2012 breakfast where I'm sharing the @CanadianOpera story— Adam Froman (@adamfroman) May 09, 2012 Shifting Donor Behaviour Online…
 
  • add this feed to my.Alltop

    The Analysis Factor

  • Can a Regression Model with a Small R-squared Be Useful?

    Karen Grace-Martin
    14 May 2012 | 3:21 pm
    R² is such a lovely statistic, isn’t it?  Unlike so many of the others, it makes sense–the percentage of variance in Y accounted for by a model. I mean, you can actually understand that.  So can your grandmother.  And the clinical audience you’re writing the report for. A big R² is always good and a small one is always bad, right? Well, maybe. I’ve seen a lot of people get upset about small R² values, or any small effect size, for that matter.  I recently heard a comment that no regression model with an R² smaller than .7 should even be interpreted. Now, there…
  • Sample Size Estimates for Multilevel Randomized Trials

    Karen Grace-Martin
    1 May 2012 | 12:21 pm
    If you learned much about calculating power or sample sizes in your statistics classes, chances are, it was on something very, very simple, like a z-test. But there are many design issues that affect power in a study that go way beyond a z-test.  Like: repeated measures clustering of individuals blocking including covariates in a model Regular sample size software can accommodate some of these issues, but not all.  And there is just something wonderful about finding a tool that does just what you need it to. Especially when it’s free. Enter Optimal Design Plus Empirical Evidence…
  • Confusing Statistical Term #6: Factor

    Karen
    27 Apr 2012 | 9:37 am
    Factor is confusing much in the same way as hierarchical and beta, because it too has different meanings in different contexts.  Factor might be a little worse, though, because its meanings are related. In both meanings, a factor is a variable.  But a factor has a completely different meaning and implications for use in two different contexts. Factor in Factor Analysis In factor analysis, a factor is an latent (unmeasured) variable that expresses itself through its relationship with other measured variables. Take for example a variable like leadership. We may want to measure a…
  • Five Extensions of the General Linear Model

    Karen Grace-Martin
    13 Apr 2012 | 11:02 am
    Generalized linear models, linear mixed models, generalized linear mixed models, marginal models, GEE models.  You’ve probably heard of more than one of them and you’ve probably also heard that each one is an extension of our old friend, the general linear model. This is true, and they extend our old friend in different ways, particularly in regard to the measurement level of the dependent variable and the independence of the measurements.  So while the names are similar (and confusing), the distinctions are important. It’s important to note here that I am glossing over many, many…
  • When to leave insignificant effects in a model

    Karen Grace-Martin
    5 Apr 2012 | 2:57 pm
    You may have noticed conflicting advice about whether to leave insignificant effects in a model or take them out in order to simplify the model. One effect of leaving in insignificant predictors is on p-values–they use up precious df in small samples. But if your sample isn’t small, the effect is negligible. The bigger effect is  on interpretation, and really the above cases are about whether it aids interpretation to leave them in. Models do get so cluttered it’s hard to figure out what’s going on, and it makes sense to eliminate effects that aren’t serving a purpose,…
  • add this feed to my.Alltop

    Sabermetric Research

  • A model for explaining home field advantage between sports

    14 May 2012 | 10:42 am
    It occurred to me that it might be possible to predict, or explain, the difference in home field advantage between different sports, based on their rules and outcomes.  In this post, I'll just talk about the "absolute" home field advantage, in terms of goals or points.  (The translation to winning percentage is easy after that, but I'll save that for a future post.)Take a look, and let me know what you think.Here's the home field advantage (HFA) for three different sports, in terms of goals or points. 0.453 - Premier League Soccer (2010-11)4.000 - NBA (estimate)0.783 - NHL (1980-81…
  • Factors influencing home field advantage

    8 May 2012 | 8:12 pm
    The more games in a season, the more likely the best teams will rise to the top, and the worse teams will fall to the bottom.  That's just common sense, and the law of large numbers.Similarly, the more innings in a game, the more likely the best team will win.  If you put the whole season into a single 1,458-inning game, there's no doubt that (for instance) the Yankees would beat the Twins.Home field advantage (HFA) is one of those things that makes teams better.  And so, the longer the game, the more likely the home team's advantage will show up in the results.  HFA for a…
  • Solution to "Another puzzle"

    7 May 2012 | 10:35 pm
    This is my solution to the puzzle from the last post.  You probably want to read that other post first, or this discussion won't make any sense at all.--------The puzzle asked for a proof that the rule change does not change the odds of either team winning the game.It seems like that shouldn't be true: the rule change gives the better team more possessions.  It definitely causes many scores to be very, very lopsided in favor of the better team.  How can it improve the score, but not change the odds of winning?In one sentence, the answer is: all those extra points always go to…
  • Another puzzle

    6 May 2012 | 10:34 am
    UPDATE, 4:45pm Sunday:  Oops!  I had to add another condition to make the puzzle work.  See below.(Note: This time, the puzzle has some sports content.)The rules in the Oversimplified Basketball Association (OBA) are as follows: Each team takes turns getting a possession of the ball.  If they score a field goal, they get two points.  There are no three-point shots, rebounds, or fouls.  If there's a turnover or a missed shot, the referee blows the whistle and the other team inbounds the ball to begin their possession.So, all possessions are independent and…
  • Puzzle

    29 Apr 2012 | 10:36 am
    (Non-sports post.)Here's a puzzle that occurred to me a few days ago.  I don't know what to do with it, so I might as well post it here. ------In a certain country, the "daily numbers" lottery works like this: you buy a $1 ticket with your choice of 3-digit number.  The winning 3-digit number is announced.  If you match exactly, you win $1000.However, the draw isn't random.  Instead, the winning number is always the *least popular* number chosen by ticket-buyers -- that is, the number that is on the fewest tickets. There's one catch: if there is a "tie" for least popular…
  • add this feed to my.Alltop

    Statistical Sage Blog

  • Great Conference … world wide

    Bonnie
    23 Apr 2012 | 6:21 pm
    Hello All, Looking for a great conference on the teaching of statistics? Look at eCOTS … http://www.causeweb.org/ecots/ It costs $15 to attend this on line conference. Bonnie
  • Interesting Data to Use in Applied Statistics

    Bonnie
    19 Apr 2012 | 11:48 am
     There are plenty of ways to use results from a study in the teaching of applied statistics. This study http://chronicle.com/article/45-Years-of-Survey-Data-Show/131149/ from the Chronicle of Higher Ed reviews data that most of us probably knew to be true. (1) Students are more focused on the financial benefit of attending college, & (2) students are reporting more emotional health problems. This data is coming from a 45 year longitudinal study. Truthfully … these kind of studies annoy me. It gives the non-psychological scientist the idea that all you have to do is write a few…
  • “Watching the Ivory Tower Fall?”

    Bonnie
    24 Mar 2012 | 11:46 am
    Could education as we know it … end, because of technology?  An article from the Wall Street Journal, on March 24th, “Watching the Ivory Tower Fall” states that technology is opening up educational opportunities for everyone, and college as we know it (and they even name Harvard and Yale) will cease to exist in its current form. I do see the benefit of free programs like Khan Academy. I encourage my students and my children to use it regularly. But, do we really just need Sal Khan and some computer guys programming the ideal educational program (from Khanacademy.com)? Or is there still…
  • Difficult Concept: Teaching Sampling Error and Sampling Distribution of the Means

    Bonnie
    22 Mar 2012 | 7:18 pm
    I am currently teaching sampling distribution of the means and sampling error to my students. They are difficult concepts to convey to students, and unlike much of my teaching, where lecture comprises a fair portion of my teaching time, I find myself “slowing down” the progress at this point by putting more of the activities in the hands of the students, forcing   them to participate in activities during class time, and requiring them to generate ideas in and out of class. There are three activities that I use to help students learn the concept of the sampling distribution of the…
  • “I’m sexy and I know it … ”

    Bonnie
    22 Mar 2012 | 7:33 am
    If students don’t believe that learning statistics is a worthwhile adventure, will they try? Yet, we all know students are bombarded with messages that statistics are hard, incomprehensible, mysterious, or just plain wrong. Students are well aware of the inaccurate, though oft stated  comment that you can say whatever you want with statistics. My response is not if you know to use and interpret statistics. So, how can we counteract the big push against the need for students to learn statistics. Tell students the truth! Applied statistics is sexy! I have comprised a few short articles…
 
  • add this feed to my.Alltop

    Fishing in the Bay

  • Academia in 60 years

    Chris Lloyd
    30 Apr 2012 | 11:29 pm
    The University of Melbourne was founded in 1885 with five professors teaching 15 students. In 1952, at the start of the post-war tertiary boom, there were around 3,000 Australian academics teaching 30,000 students across eight Universities. There are now some 43,000 academics servicing 1.2 million students (28% of them international) across 41 Australian universities. Based on the past 60 years, you might predict a bright future for academics in 2072. However, I would not be surprised if the number of genuine academic positions returns to 1952 levels. Researching and teaching Academics create…
  • Frijtening fears of data security

    Chris Lloyd
    14 Mar 2012 | 7:55 pm
    Controversial economist Paul Fritjers is always a lively and thought provoking read. Recently at Club Troppo, he has posted on his top five economic reforms that make’ good economics in the sense of being in the interest of the long-run welfare of Australia.” One of them involves the ABS…. I have always found ABS phone staff pretty helpful and there is plenty of free stuff on their site that is reasonably easy to search once you get the hang of it. But, as someone who just paid the ABS $450 for 5 years of data on deaths broken down by exact age and gender, I am not presently disposed to…
  • The Melbourne Model

    Chris Lloyd
    22 May 2011 | 8:04 pm
    There has been recent discussion in the MSM and blogosphere about the relative merits of the Melbourne model compared to more traditional alternatives. There has been a provocative article by Steven King saying thanks very much Melbourne for sending us so many of your best and brightest. There was a rebuttal by Glyn Davis saying that what Melbourne lose in under-graduate enrolments they more than make up for in specialist masters students, and that this was always the intention. There is a placatory article by Monash VC Ed Byrne. I added a comment to his article which got blown up into an…
  • Congratulations to Annals of Statistics

    Chris Lloyd
    8 Dec 2010 | 8:15 pm
    Here is a good news story about an academic journal that is prepared to set the record straight. The October issue of Annals of Statistics had a paper by Weishen Wang about smallest confidence limits. The main Theorem 4 gives a formula for a largest possible lower limit for a scalar parameter in an arbitrary discrete distribution. Unfortunately, this construction was first given by Robert Buehler (JASA,1957). The proof of its optimality was given by Jobe and David (JASA,1992) under a non-trivial restriction. The proof was generalised to remove this restriction by Lloyd & Kabaila…
  • ARC reforms: gender bias ignored

    Chris Lloyd
    30 Nov 2010 | 6:27 pm
    The ARC spend around m$300 per year, receive 4000 applications and fund around 1000 of them for an average k$300 per year each. The success rate is around 23%. On Nov 3 this year, they posted a “consultation document” (HERE) outlining what appear to be some pretty major changes to the Discovery scheme. If my understanding of this document is correct, the proposed changes are ill-conceived. They divert money to poorer projects, create perverse incentives and manifestly fail to solve the main problem that the ARC claim to be worried about. Let’s get to the nuts and bolts then.
  • add this feed to my.Alltop

    Featured Blog Posts - AnalyticBridge

  • Understanding the Reality of Real-Time Analytics

    Vincent Granville
    15 May 2012 | 2:30 pm
    Webinar: Understanding the Reality of Real-Time Analytics Join us for a Webinar on May 31 Space is limited Reserve your Webinar seat now at: https://www4.gotomeeting.com/register/245307551 Today's businesses are challenged to analyze massive data volumes with great speed and efficiency with the hope of finding competitive advantage, strategic imperatives, or a random nugget of business gold. As a result, the interest in “real-time” analytics, or what people think of as “real-time” analytics, is at an all time high. “I need to know now,” is the big data challenge we are all…
  • Machine Learning in Python has never been easier

    Jos Verwoerd
    15 May 2012 | 4:20 am
    At BigML we believe that over the next few years automated, data-driven decisions and data-driven applications are going to change the world.  In fact, we think it will be the biggest shift in business efficiency since the dawn of the office calculator, when individuals had “Computer” listed as the title on their business card.  We want to help people rapidly and easily create predictive models using their datasets, no matter what size they are. Our easy-to-use, public API is a great step in that direction but a few bindings for popular languages is obviously a big bonus. Thus, we are…
  • BiG DaTa & Vectorization

    Manish Bhoge
    14 May 2012 | 12:50 am
    It has been while when Big data entered into the market and buzz the analytics world. Now a day all analytics leaders are chanting about Big data applications. Since I have started with Hadoop technologies and with Machine learning one question has been bugging in mind: Which is a greater innovation Big Data Or Machine Learning & Vectorization? When it comes to analytics Vectorization and machine learning more innovative. Wait a minute, I don't want to be biased and I am not concluding here. But, i would like to showcase more on the direction when we take out data for the analytics…
  • SAS Global Forum: Here’s the Wrap Up!

    Tricia Aanderud
    14 May 2012 | 7:46 am
    SAS Global Forum 2012 was a success! After a whirlwind week of activities followed by a vacation and week of rest – I’m ready to give you some highlights.  It was a lot of fun! Tip: Click on any picture to enlarge it. Day 1 – Saturday Ready for the Tweet-Up The biggest drama was at the airport – our flight was delayed due to mechanical failure so I decided it might be better to take a later flight. Met@Steve0verton at the airport and @PhilipB who were both headed to Orlando.  As a result of the later flight we were late to the Tweet-Up so we missed the first round of drinks.  It…
  • Email marketing: analytic tips to boost performance by 300% - case study

    Vincent Granville
    13 May 2012 | 2:00 pm
    This post is part of our blog post series on data science case studies and success stories. Analyticbridge improved open rates by 300%, and dramatically improved total clicks and click-through rates using the following strategies: 1. Remove subscribers who did not open the newsletter during the last 8 deployments This produced a spectacular increase in open rate, and also significantly improved our "spam score", as our newsletter chances of ending up in a spam box or a spam trap is reduced to almost 0. 2. Segmentation of subscriber base to better target members, for instance to send a UK…
  • add this feed to my.Alltop

    Beyond the Box Score

  • Derek Lowe, Pitching to Contact?

    Justin Bopp
    16 May 2012 | 12:34 pm
    1. Derek Lowe is the 2012 MLB leader in ERA. 2. Derek Lowe pitched a shutout last night, May 15th, 2012. 3. Nobody expected either of the above. What gives? Our friend Harry Pavlidis of The Hardball Times takes a look at Lowe's pitch types vs. contact rates and concludes that Lowe is "pitching to contact" (shudder). But there are several quick lessons that can help us answer the question posed above. First, we know ERA is suspect because it fails to include (exclude) defense, and a closer look shows that he's benefiting from his defense this season while actually pitching much the same as…
  • Bryce Harper, Mike Trout, and the Race for History

    Julian Levine
    15 May 2012 | 11:00 am
    There are currently two position players in baseball that are younger than 21: one of them, Mike Trout, is 20 years old; the other, Bryce Harper, is 19. Harper and Trout came in at #1 and #3, respectively, on Baseball America's top 100 prospects for 2012, as well as Minor League Ball's top 120 prospects. I need not explain that these are two extremely talented players who -- if they live up to the hype -- have marvelous careers ahead of them. But it's worth noting that they've already accomplished a lot, just being in the majors already. In fact, if they can continue to be moderately…
  • A Graphic Look at Jake Peavy: Actual vs Projected (Surprise, He's Overachieving!)

    David Fung
    15 May 2012 | 8:01 am
    It certainly has been surprising that Jake Peavy has seemed to have found his 2007-self, the one who pitched 223.1 and a career best 2.24 FIP (not to mention, this was his Cy Young-winning season). Last Thursday, David Schoenfield proclaimed Peavy as the best pitcher in baseball, and while only 7 starts into the season he's laying claim to the title of the best pitcher in the AL. Always hard to judge this early into the season, is this just a hot start, or is he set to drop off sometime soon? Peavy has offically won the Pitcher of the Month award for the AL, so others are recognizing his…
  • A PITCHf/x Look At Eight Rookie Starters

    Nathaniel Stoltz
    14 May 2012 | 11:00 am
    MLB debuts are awesome to see, for a ton of reasons. One of the more obscure reasons for pitchers is that we get to see them under the PITCHf/x microscope and dissect their arsenal scientifically rather than relying on secondhand information. In many cases, the consensus doesn’t match up well with the Pitch F/X information. To give just one example, let’s take a look at what Baseball America said about Zach Stewart in its past few Prospect Handbooks: 2009: "[His] 93-96 mph fastball and 82-85 mph slider give him a pair of potential out pitches." 2010: "Stewart’s bread and…
  • Their Iconic Season was not their Most Valuable

    adarowski
    14 May 2012 | 8:01 am
    When you think of Carl Yastrzemski, you think of 1967. When you think of Steve Carlton, you think of 1972. You should—those were by far their best seasons. In 1967, Yaz was worth 12.0 WAR as he captured the AL's Triple Crown. He hit .326/.418/.622 with 44 homers and 121 RBI in an offensively depressed era. He was even worth 23 runs in the field (which looks questionable until you see he had 22 the year before and 25 the year after. In 1967, Yaz was simply legit. His next-best season was 1968 with 10.0 WAR. Interestingly, that's the year he won the batting title with a .301 average. Now…
 
  • add this feed to my.Alltop

    Mathematics and Statistics at Williams

  • Williams Wins Regional Math Competition

    Frank Morgan
    14 May 2012 | 2:28 pm
    The Williams College team, consisting of Weng-Him Cheung ’15, Benjamin Demeo ’15, and Liyang Zhang ’12, won the Regional Undergraduate Mathematics competition at Central Connecticut State University Saturday, April 28, 2012. Liyang got high score and won first prize in the competition. Congratulations to the team and Coach Stoiciu. Last year Williams had the winning team and the top three individuals: Nick Arnosti, David Thompson, and Liyang Zhang.
  • Math ties English and Psychology for majors

    Frank Morgan
    11 May 2012 | 1:24 pm
    For the first time, next year, there will be as many math majors as English majors and as Psychology majors, 132, about 12% of all juniors and seniors, compared to the national average of about 1%. The only departments with more are Economics (155) and History (146).
  • Prizes announced at Majors’ Dinner

    Frank Morgan
    10 May 2012 | 6:54 pm
    Professor and Chair Cesar Silva announced this year’s Math/Stats prizes at the annual gala Majors’ Dinner at the Williams Inn Thursday evening, May 10, 2012: ROSENBURG PRIZE for best senior: Liyang Zhang GOLDBERG AWARD for best colloquium: Erik Levinsohn “Dude, Where’s my Convex Hull?” Niralee Shah “Hearing the Shape of a Drum” Honorable Mention: Patrick Aquino, Carolyn Geller, David Gold, Stephanie Jensen, Andrew Nguyen, Sidney Luc Robinson, Matthew Staiger, Tarjinder Singh WYSKIEL AWARD in Teaching: Connor Stern MORGAN PRIZE in Applied Math:…
  • De Veaux interviewed by MAA

    Frank Morgan
    28 Apr 2012 | 4:24 am
    On the occasion of his recent featured address at the headquarters of the Mathematical Association of America (MAA) , described in his blog post, Prof. Richard De Veaux was interviewed by Ivars Peterson, Director of Communications for the MAA.
  • Math Awareness Month — Mathematics, Statistics, and the Data Deluge.

    Richard DeVeaux
    27 Apr 2012 | 2:43 pm
    In case you haven’t heard, it’s Math Awareness Month and this year the Mathematics Association of America (MAA) has chosen as the 2012 theme — the Data Deluge that surrounds us all.  Because this Big Data theme obviously involves statistics, the MAA has teamed with the American Statistical Association (ASA) for many of their events. One of those was the monthly lecture at the MAA’s Carriage House in Washington D.C. (http://www.maa.org/dist-lecture/past-lectures.html). I was invited to talk about data mining, the area that’s been my focus for quite some time…
  • add this feed to my.Alltop

    FlowingData

  • The Descriptive Camera

    Kim Rees
    16 May 2012 | 11:42 am
    The unassuming little Descriptive Camera made me rethink data. This project by Matt Richardson was on display at the ITP Spring Show. The basic premise is that you take a photo and the camera spits out a textual description of what it sees. The results are remarkably accurate, detailed, and humorous. Here's what my photo said: A woman wearing a seriously awesome jacket that is printed with yellow, blue, and grey circles looks at her ipad rather than making eye contact with Matt Richardson. I mean, my jacket *IS* seriously awesome! So it not only described what it saw, but it also has great…
  • What is missing?

    Kim Rees
    16 May 2012 | 2:10 am
    What is Missing? by Maya Lin seeks to raise awareness about the mass extinction of species. It has a beautiful interface. The world map is black on a sea of black. Your mouse acts as a sort of flashlight layered between land and water, showing you glimpses of familiar coastlines and allowing you to select dots that tell the stories of extinction. We are experiencing the sixth mass extinction in the planet's history, and the only one to be caused not by a catastrophic event, but by the actions of a single species - mankind. On average, every 20 minutes a distinct living species of plant or…
  • How to Visualize and Compare Distributions

    Nathan Yau
    16 May 2012 | 12:47 am
    There are a lot of ways to show distributions, but for the purposes of this tutorial, I'm only going to cover the more traditional plot types like histograms and box plots. Otherwise, we could be here all night. Plus the basic distribution plots aren't exactly well-used as it is. Before you get into plotting in R though, you should know what I mean by distribution. It's basically the spread of a dataset. For example, the median of a dataset is the half-way point. Half of the values are less than the median, and the other half are greater than. That's only part of the picture. What happens in…
  • ITP Spring Show: Iraq war and diabetes visualizations

    Kim Rees
    15 May 2012 | 5:15 am
    Yesterday I visited the ever popular NYU ITP bi-annual show which is a showcase of the students' experimental and ingenious interactive work. I stopped to talk to data visualization student and self-tracker, Doug Kanter, about his work. His first and smaller piece was about the war in Iraq. The image above depicts the number of wounded US soldiers by state (and territory) using the red stripes. The stars show the number of soldiers killed. I'm sure we could quibble about labels and where the bar chart starts, but to me, the tattered appearance of the flag created by data about war is very…
  • Welcome Kim Rees

    Nathan Yau
    15 May 2012 | 4:16 am
    I'm going to be away for a couple of weeks, with little to no Internet access most of the time, so I've asked Kim Rees to step in while I'm gone. She's the co-founder of Periscopic, one of my favorite information visualization firms, and she was the technical editor for Visualize This. You're in good hands. You can follow her at @krees. Be good, and see you all when I get back. She's all yours, Kim.
  • add this feed to my.Alltop

    Byte Mining

  • SIAM Data Mining 2012 Conference

    Ryan
    15 May 2012 | 1:00 pm
    Note: This would have been up a lot sooner but I have been dealing with a bug on and off for pretty much the past month! From April 26-28 I had the pleasure to attend the SIAM Data Mining conference in Anaheim on the Disneyland Resort grounds. Aside from KDD2011, most of my recent conferences had been more “big data” and “data science” oriented, and I wanted to step away from the hype and just listen to talks that had more substance. Attending a conference on Disneyland property was quite a bizarre experience. I wanted to get everything I could out of the conference,…
  • My Interview about the Statistics Major

    Ryan
    16 Mar 2012 | 3:23 pm
    Recently, I participated in an email interview about what being a Statistics major entailed, how I got interested in the field and the future of Statistics. I figured this might be of interest to those that are contemplating majoring in Statistics, or considering a career in Data Science. Q1: Why did you decide to pursue a major in statistics in college? A: “When I was a kid, I really enjoyed looking at graphs, plots and maps. My parents and I could not make of what was behind the interest. At the same time, I was also heavily interested in education. My mother was a teacher and the…
  • “Hold Only That Pair of 2s?” Studying a Video Poker Hand with R

    Ryan
    8 Jan 2012 | 3:32 am
    Whenever I tell people in my family that I study Statistics, one of the first questions I get from laypeople is “do you count cards?” A blank look comes over their face when I say “no.” Look, if I am at a casino, I am well aware that the odds are against me, so why even try to think that I can use statistics to make money in this way? Although I love numbers and math, the stuff flows through my brain all day long (and night long), every day. If the goal is to enjoy and have fun, I do not want to sit there crunching probability formulas in my head (yes that’s fun,…
  • Merry Christmas 2011 From Byte Mining!

    Ryan
    24 Dec 2011 | 1:28 pm
    To all of my readers and followers, I wish you a very Merry Christmas and a very joyous and safe Happy New Year! This year, I am thankful for the community that has sprung up around Data Science and open-source data collection and processing. This blog is almost two years old, and like with Twitter, I have been able to communicate with many data scientists, enthusiasts and some of the most prolific contributors to the data science software community. I am thankful for all of the wonderful people I have met and have yet to meet, and for your comments and reading.
  • Parsing Wikipedia Articles: Wikipedia Extractor and Cloud9

    Ryan
    28 Nov 2011 | 1:00 pm
    Lately I have doing a lot of work with the Wikipedia XML dump as a corpus. Wikipedia provides a wealth information to researchers in easy to access formats including XML, SQL and HTML dumps for all language properties. Some of the data freely available from the Wikimedia Foundation include article content and template pages article content with revision history (huge files) article content including user pages and talk pages redirect graph page-to-page link lists: redirects, categories, image links, page links, interwiki etc. image metadata site statistics The above resources are available…
 
  • add this feed to my.Alltop

    information aesthetics

  • Venngage: And Yet Another Online Infographics Editor

    15 May 2012 | 4:11 pm
    After 2 very similar posts in a very small timeframe, featuring Easel.ly and infogr.am respectively, I seem not to be able to follow the 'automatic infographics editing' scene fast enough. Automatic resume infographics creator visualize.me has just launched Venngage [venngage.com], which aims to empower people to create beautiful infographics in minutes, so that "creating infographics [becomes] as easy as creating a Powerpoint presentation". As a unique feature, Venngage's visual elements are displayed as pure HTML elements, which should positively influence SEO stats, page ranks and back…
  • infogr.am: Another Online Editor of Interactive Infographics

    15 May 2012 | 3:28 pm
    A few days ago, we posted the website Easel.ly, a new web-based service that aims to empower lay users to design infographic-like illustrations within the browser. Unfortunately, Easel.ly seems more apt in combining infographic-like images on a canvas, than to link real numerical data to a graphical form. So here comes Infogr.am [infogr.am], another competitor towards semi-automatic, web-based infographics editing. Developed by a start-up based in Riga (Latvia), though now based in London, the online service offers a collection of infographic themes as well as different interactive chart…
  • The Historical Evolution of Europe's Borders

    15 May 2012 | 2:36 pm
    The movie "Epic time-lapse map of Europe" fast forwards a map from the year 1000 AD until 2003 to reveal the dynamic nature of Europe's borders, alliances, unions, territories, and occupied lands. An alternative movie takes a bit longer, but contains useful textual annotations such as the actual year that is shown and the events that occurred. The movie was made with "Centennia Historical Atlas" by Centennia Software. Watch the movies below. Via @tillnm.
  • FatFonts: New Font Links Value of a Number to Amount of Pixels Shown

    11 May 2012 | 8:47 am
    FatFonts [fatfonts.org] is a novel numeric typeface for data visualization purposes. The design of FatFonts is based on Arabic numerals, but the amount of ink (i.e. dark pixels) used for each digit is proportional to its quantitative value. This font enables the reading of numerical data while still preserving an overall visual context. The typeface was developed by Miguel Nacenta, Uta Hinrichs and Sheelagh Carpendale at the University of Calgary. In the online gallery several case studies are documented on how this font can be used for good used. More detailed information about these…
  • Easel.ly Debutes Online Editor of Infographics

    10 May 2012 | 2:02 pm
    San Diego-based start-up Easel.ly [easel.ly] is offering a beta service that allows lay people to design and implement their own "infographics" via an online editor. The user-based customization of infographics seems to be the next phase after the automatic generation of infographics, and has already been promised by community websites like visual.ly. For now, easel.ly allows users to drag and drop predefined and uploaded vector images on pre-designed canvases and themes for easy creation and customization of infographics. According to the founders Patrick Alcoke and Neil Harris, all themes…
Log in