Tuesday, June 15, 2010

Anybody have a few millions lines of Apache Log files they can share?

Anybody have a few millions lines of Apache Log files they can share? I am working on a lab/demo on InfiniDB and need a few millions lines of an Apache HTTPD log file. I no longer run any large websites and would rather use real data over creating something. I will sanitize your URL so you will be anonymous, so www.YourCompanyHere.com will end up as www.abc123.com or something similar.

The demo/lab will show how to load data into the columnar InfiniDB storage engine and run some analytics against the data. Please let me know if you can help.


Swany said...

Hi Dave,

You might give the Apache Incubator project 'Olio' a look. It simulates a Web 2.0 type workload (ajax, etc) and can quickly create real logs from benchmark workflow activity.

Jeremy Cole said...

Hey Dave,

If you need a big bundle of easy to use data, I could recommend my FlightStats database. It is quite large -- the currently loaded dataset is ~47M ontime records, and a whole slew of supporting data. See: http://flightstats.us/

If you're interested in using it, I'd be happy to help out. Email me at jeremy@jcole.us.

rpbouman said...

Contact Mozilla, they are big log analyzers:


Dave Stokes said...

Thanks Swany, Jeremy & Roland,

I was looking for Apache logs as they tend to be large and fairly ubiquitous. I will check out your recommendations. Thanks!