Tuesday, June 15, 2010

Anybody have a few millions lines of Apache Log files they can share?

Anybody have a few millions lines of Apache Log files they can share? I am working on a lab/demo on InfiniDB and need a few millions lines of an Apache HTTPD log file. I no longer run any large websites and would rather use real data over creating something. I will sanitize your URL so you will be anonymous, so www.YourCompanyHere.com will end up as www.abc123.com or something similar.

The demo/lab will show how to load data into the columnar InfiniDB storage engine and run some analytics against the data. Please let me know if you can help.

4 comments:

Justin Swanhart said...

Hi Dave,

You might give the Apache Incubator project 'Olio' a look. It simulates a Web 2.0 type workload (ajax, etc) and can quickly create real logs from benchmark workflow activity.

Jeremy Cole said...

Hey Dave,

If you need a big bundle of easy to use data, I could recommend my FlightStats database. It is quite large -- the currently loaded dataset is ~47M ontime records, and a whole slew of supporting data. See: http://flightstats.us/

If you're interested in using it, I'd be happy to help out. Email me at jeremy@jcole.us.

Anonymous said...

Contact Mozilla, they are big log analyzers:

http://blog.mozilla.com/metrics/

Dave Stokes said...

Thanks Swany, Jeremy & Roland,

I was looking for Apache logs as they tend to be large and fairly ubiquitous. I will check out your recommendations. Thanks!