Name: Building your big data search stack with Apache Nutch 2.x
Start: 2014-04-09T11:15:00-0700
End: 2014-04-09T12:05:00-0700

Register Now for ApacheCon North America 2014 - April 7-9 in Denver, CO. Registration fees increase on March 15th, so don’t delay!

Back To Schedule

Building your big data search stack with Apache Nutch 2.x

Lewis John McGibbney - In this tutorial Lewis encourages you to join him in building your own customized search stack capable of handling enormous data volumes. Although the tutorial is focused on Apache Nutch 2.x, we will also be using source code from Apache Gora; an open source framework which provides an in-memory data model and persistence for big data, which acts as an object (WebPage or Host) to-datastore mapping framework for crawl data. Apache Nutch 2.x differs from the Nutch 1.x branch in one key area; storage is abstracted away from any specific underlying data store by using Apache Gora for handling object to persistent mappings. This means we can implement an extremely flexibile model/stack for storing everything (fetch time, status, content, parsed text, outlinks, inlinks, etc.) into a number of NoSQL storage solutions.

Speakers

Lewis McGibbney

Enterprise Search Technologist III, Jet Propulsion Laboratory

Wednesday April 9, 2014 11:15am - 12:05pm PDT
Confluence B

Lucene & Friends

ApacheCon North America 2014

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Lewis McGibbney

Attendees (0)