Anybody have a few millions lines of Apache Log files they can share? I am working on a lab/demo on InfiniDB and need a few millions lines of an Apache HTTPD log file. I no longer run any large websites and would rather use real data over creating something. I will sanitize your URL so you will be anonymous, so www.YourCompanyHere.com will end up as www.abc123.com or something similar.
The demo/lab will show how to load data into the columnar InfiniDB storage engine and run some analytics against the data. Please let me know if you can help.
4 comments:
Hi Dave,
You might give the Apache Incubator project 'Olio' a look. It simulates a Web 2.0 type workload (ajax, etc) and can quickly create real logs from benchmark workflow activity.
Hey Dave,
If you need a big bundle of easy to use data, I could recommend my FlightStats database. It is quite large -- the currently loaded dataset is ~47M ontime records, and a whole slew of supporting data. See: http://flightstats.us/
If you're interested in using it, I'd be happy to help out. Email me at jeremy@jcole.us.
Contact Mozilla, they are big log analyzers:
http://blog.mozilla.com/metrics/
Thanks Swany, Jeremy & Roland,
I was looking for Apache logs as they tend to be large and fairly ubiquitous. I will check out your recommendations. Thanks!
Post a Comment