I figured out why the Heritrix crawler was running at one page per second.
It was configured it to run using a default Java VM size of 256m.
cat /etc/init.d/heritrix.sh
#!/bin/bash
/opt/heritrix/bin/heritrix --bind=yowb3 --admin=admin:admin
I changed this to 2048m and it seems to be running 10x faster
cat /etc/init.d/heritrix.sh
#!/bin/bash
export JAVA_OPTS=" -Xmx2048m"
/opt/heritrix/bin/heritrix --bind=yowb3 --admin=admin:admin
-----------------
Rates
9.55 URIs/sec (16.1 avg)
246 KB/sec (389 avg)
Load
6 active of 50 threads
1 congestion ratio
Thursday, January 21, 2010
Thursday, January 7, 2010
Lucene index writes per minute slow down
Sunday, January 3, 2010
Drupal/LAMP installation on Ubuntu
Install XAMPP (LAMP) and DRUPAL on Ubuntu
Old notes below:
1. Install LAMP
XAMPP install made easy-use the instructions on this site to install the LAMP stack
2. Install DRUPAL
Reset mysql password if necessary.
http://en.kioskea.net/faq/sujet-630-reinitializing-the-root-password-of-mysql
Install DRUPAL on Ubuntu
Alternate installation instructions with notes on security and important files
Old notes below:
1. Install LAMP
XAMPP install made easy-use the instructions on this site to install the LAMP stack
2. Install DRUPAL
Reset mysql password if necessary.
http://en.kioskea.net/faq/sujet-630-reinitializing-the-root-password-of-mysql
Install DRUPAL on Ubuntu
Alternate installation instructions with notes on security and important files
Subscribe to:
Posts (Atom)