Interviewee: Jim Kent.
With so much information being generated, both the public and private consortia had to store and process the data in new ways. Here, Jim Kent, who wrote the assembly program for the public sequence, talks about dealing with that amount of data.
(DNAi Location: Genome > The Project > Pieces of the puzzle > Dealing with the data )
To start out with you had about four hundred thousand pieces and the size of the whole thing is pretty overwhelming, it's about four billion bases and, well, it was about four billion bases before the assembly. There was a lot of overlap between the pieces, so that when we finally put it together it was only, I think it was about 2.7 billion bases. And so just dealing with data on that scale, I mean if you want to copy that data from one place to another, you know, it can take an hour or two just to make the copy. We had to get it to run on actually a whole farm of computers. We had a hundred computers to do this. And this was actually, it was kind of an interesting farm. It was one we had borrowed. They were machines that had arrived a little bit early for use in the instructional labs at UCSC [The University of California, Santa Cruz] so we absconded them for, for about three months to work on the Human Genome Project instead, because they weren't needed till the next quarter.
california santa cruz,human genome project,university of california santa cruz,jim kent,project pieces,dna sequencing,pieces of the puzzle,instructional labs,assembly program,dnai,interviewee,consortia,hundred thousand,bioinformatics,new ways,three months,little bit,university of california
For the first draft of the genome sequence, both teams were working to identify the number of human genes. Here, Ewan Birney, a "numbers man" from the public genome project, explains how genes can be recognized and the data from the genome project used.