This is Interesting: Free Magazines for Graphics designers and webmasters  


Home > Archive > Webmaster forum > November 2007 > Fastest programming options





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Fastest programming options
Jason Carlton

2007-11-21, 6:17 pm

I have a site with a peak traffic load in the neighborhood of 200
accesses per second (roughly 45,000 unique visitors, and a million
page views). Most of the features of the site are driven by MySQL, and
the number of queries has caused a significant load on the server,
resulting in VERY slow pages.

So I'm curious if any of you have experimented with bench times on
queries, comparing multiple languages?

For example, I have a script in PERL that queries a table with 25,000
rows. This script queries the database twice each time it's loaded,
and it's not uncommon for there to be up to 100 queries at once. Which
would finish the query faster, PHP or Perl? Or other?

Similarly, I use a consistent header file (PHP) throughout the site
that I import to each page. This lets me update the entire site by
changing the one page. I originally used SSI to import it on the
static pages, but this was slow so I moved over to the PHP "include"
command. This, too, has become too slow. Is there another option that
I'm overlooking?

TIA,

Jason
George L. Sexton

2007-11-21, 10:17 pm

On Wed, 21 Nov 2007 14:36:48 -0800, Jason Carlton wrote:

> I have a site with a peak traffic load in the neighborhood of 200
> accesses per second (roughly 45,000 unique visitors, and a million page
> views). Most of the features of the site are driven by MySQL, and the
> number of queries has caused a significant load on the server, resulting
> in VERY slow pages.


One thing to do is to look at the queries, and the execution plans and
make sure that indexes are being used if possible, rather than slower
table scans.

You should also look at your queries and make sure you're not bring back
fields you don't use, or records you filter out in other code. These would
all add overhead.

> So I'm curious if any of you have experimented with bench times on
> queries, comparing multiple languages?


I'm a Java bigot. Written well, it's very scalable. We run 250-300 virtual
hosts/web calendar applications for our customers on uni-processor P4
machines with 2GB of RAM. That's pretty good.

>
> For example, I have a script in PERL that queries a table with 25,000
> rows. This script queries the database twice each time it's loaded, and
> it's not uncommon for there to be up to 100 queries at once. Which would
> finish the query faster, PHP or Perl? Or other?


If you're using CGI, and not mod_perl where the web browser has to fire
off a zillion copies of PERL at the same time to run the CGI script, then
PHP is probably faster.

If PERL isn't using a connection pool for database connections, that would
be a significant slowdown as well.


> Similarly, I use a consistent header file (PHP) throughout the site that
> I import to each page. This lets me update the entire site by changing
> the one page. I originally used SSI to import it on the static pages,
> but this was slow so I moved over to the PHP "include" command. This,
> too, has become too slow. Is there another option that I'm overlooking?



You seem to have a lot of ideas on what's slow and what's not. Are these
things you've learned from testing, or just "everybody knows". Everybody
is usually wrong. I see the Tomcat people work on native connectors to get
the last tiny bit of performance out, but then people run trash like JSP
and tag libraries that just generate inordinate amounts of code. In my own
app, I found that one page would run in 8ms, while another that had
database stuff took 100ms. The moral of the story is that most of the
effort in real web apps in the database, and spending a lot of time
optimizing the wrong things is pointless.

Remember, if a certain part of your web app is taking 10% of the execution
time, then even if you can PERFECTLY optimize it (reduce the time to
zero), you're only going to see a 10% improvement.

I would recommend downloading something like JMeter, and actually testing
your app. You'll be surprised.

"When you can measure what you are speaking about and express it in
numbers, you know something about it."
Lecture to the Institution of Civil Engineers,
3 May 1883
~ Lord Kelvin ~



> TIA,
>
> Jason




--
George Sexton
MH Software, Inc. - Home of Connect Daily Web Calendar
http://www.mhsoftware.com/connectdaily.htm
Jason Carlton

2007-11-22, 3:16 am

> One thing to do is to look at the queries, and the execution plans and
> make sure that indexes are being used if possible, rather than slower
> table scans.


I'm still relatively new to DBI on any real level, so I can't pretend
that my application is perfect. I did create an index, though, and I'm
assuming that the query automatically uses it since MySQL
documentation never said anything like "type this to access the index
specifically."

I could cut the queries on one page in half if there was a better way
to return the number of rows in a table. Right now in the PERL script
(my most heavy application), I'm using these 2 queries separately:

# Get Info to Show
my $sth = $dbh->prepare("SELECT id, lastmodified, subject, firstname,
lastname, size, shortdesc FROM $forum_subjects ORDER BY lastmodified
DESC LIMIT 20 OFFSET $start");
$sth->execute;

# Get Number of Rows
my ($length_filenames) = $dbh->selectrow_array("SELECT COUNT(*) FROM
$forum_subjects");


> You should also look at your queries and make sure you're not bring back
> fields you don't use, or records you filter out in other code. These would
> all add overhead.


Unfortunately, I use all of the fields in the query above. I do have
other queries (like showing a member's profile) that I can just pull
out one or two fields at a time, but those queries don't have the
overhead that the one above has.


> I'm a Java bigot. Written well, it's very scalable. We run 250-300 virtual
> hosts/web calendar applications for our customers on uni-processor P4
> machines with 2GB of RAM. That's pretty good.


I've used Java a lot for software applications, but I've been hesitant
about using it for web apps. I have a chat room on my site that's in
Java, and at least once a day someone posts that it doesn't work (and
it always leads back to a problem on their end).

Are the apps you're using targeted to the general public, or a
specific group that you can ensure has the right equipment? If you're
targeting the public, then running 250-300 hosts on that machine is
great! Mine is a dual Xeon machine with 4G of RAM, and running half
that load but still having problems.


> If you're using CGI, and not mod_perl where the web browser has to fire
> off a zillion copies of PERL at the same time to run the CGI script, then
> PHP is probably faster.


You're right, I'm using CGI, not mod_perl. I've never had a machine
that had mod_perl installed, although I've been thinking about setting
it up and modifying it. Do you know, can I install it on an existing
machine without having an impact on any of the existing applications?


> If PERL isn't using a connection pool for database connections, that would
> be a significant slowdown as well.


It's not, you're right.


>
> You seem to have a lot of ideas on what's slow and what's not. Are these
> things you've learned from testing, or just "everybody knows". Everybody
> is usually wrong.


Oh, no, when I say a particular thing is slow, I mean SLOW!

When I first built this site in 2001, I had never used PHP so
everything was either HTML, Perl, or SSI. I was able to import the
header file with a simple SSI include or, in the PERL apps, a simple
OPEN FILE command, but as the traffic grew I would see times that a
page would take up to 2 minutes to load!

I changed the static pages to PHP instead of HTML, and used a simple
include() statement, and in the PERL scripts started using
LWP::Simple. This solved the problem for awhile, but as of about a
year ago, the traffic had quadrupled and I started seeing the same
delay.

I went through the entire site and hard-coded the header page onto
each page, so now the header page isn't used at all. This seemed to
resolve THAT problem, so I know that the delay was coming from a lot
of accesses to the same header file. But, if I add something to the
navigation, I have to do it manually on every page, which is pretty
inconvenient.


> I see the Tomcat people work on native connectors to get
> the last tiny bit of performance out, but then people run trash like JSP
> and tag libraries that just generate inordinate amounts of code. In my own
> app, I found that one page would run in 8ms, while another that had
> database stuff took 100ms. The moral of the story is that most of the
> effort in real web apps in the database, and spending a lot of time
> optimizing the wrong things is pointless.


I've always felt the same way, and I've never had a problem like this
with any other site. The bottleneck has to be the database, but I
simply don't see how it could be made to be any more efficient.

I've also discovered recently that mysql has a maximum of 100 queries
at once. I didn't know this, and if someone goes to my homepage at the
exact same time that someone else goes to the script that I mentioned
earlier then those 2 people alone account for 9 queries! So, only
having 22 people on the site at the exact same time will max out the
database... but I might have 200 at one time visiting!


> I would recommend downloading something like JMeter, and actually testing
> your app. You'll be surprised.


Thanks, I'm going to check that out right now.

Jason
Jeff

2007-11-22, 6:16 am

Jason Carlton wrote:
>
>
> I'm still relatively new to DBI on any real level, so I can't pretend
> that my application is perfect. I did create an index, though, and I'm
> assuming that the query automatically uses it since MySQL
> documentation never said anything like "type this to access the index
> specifically."
>
> I could cut the queries on one page in half if there was a better way
> to return the number of rows in a table. Right now in the PERL script
> (my most heavy application), I'm using these 2 queries separately:
>
> # Get Info to Show
> my $sth = $dbh->prepare("SELECT id, lastmodified, subject, firstname,
> lastname, size, shortdesc FROM $forum_subjects ORDER BY lastmodified
> DESC LIMIT 20 OFFSET $start");
> $sth->execute;


How big are the tables and what does the index look like? I assume
id is your primary key and you have a seperate index just on lastmodified?
>
> # Get Number of Rows
> my ($length_filenames) = $dbh->selectrow_array("SELECT COUNT(*) FROM
> $forum_subjects");


I'm not so sure that these are what's causing your site to be slow.
Is it this pages that runs slow? Or is it a less used page that is
causing the delays?

You can convert some of these pages to static. Just recreate them
when you have a change in the data. For that matter you can store your
table row counts in an external file and just update them when someone
adds/deletes to that table. I do that for data that is accessed a lot
more often than it changes, just stuff it in a config file and import
that in all the scripts.

You could do something like:

do $forum_subjects . 'row_count.cfg'; # that file could look like:

our $length_filenames=420;
>
>
>
>
>
> Unfortunately, I use all of the fields in the query above. I do have
> other queries (like showing a member's profile) that I can just pull
> out one or two fields at a time, but those queries don't have the
> overhead that the one above has.
>

<snip>
>
> When I first built this site in 2001, I had never used PHP so
> everything was either HTML, Perl, or SSI. I was able to import the
> header file with a simple SSI include or, in the PERL apps, a simple
> OPEN FILE command, but as the traffic grew I would see times that a
> page would take up to 2 minutes to load!
>
> I changed the static pages to PHP instead of HTML, and used a simple
> include() statement, and in the PERL scripts started using
> LWP::Simple. This solved the problem for awhile, but as of about a
> year ago, the traffic had quadrupled and I started seeing the same
> delay.
>
> I went through the entire site and hard-coded the header page onto
> each page, so now the header page isn't used at all. This seemed to
> resolve THAT problem, so I know that the delay was coming from a lot
> of accesses to the same header file. But, if I add something to the
> navigation, I have to do it manually on every page, which is pretty
> inconvenient.


You can do something as simple as just marking where the navigation is
on the page and then spidering the site and using a simple regex to
replace the navigation.

Also, you may wish to consider moving the database to a seperate
server. I have a feeling though that you have a more easily solvable
problem.

Jeff
>
>
>
>
>
> I've always felt the same way, and I've never had a problem like this
> with any other site. The bottleneck has to be the database, but I
> simply don't see how it could be made to be any more efficient.
>
> I've also discovered recently that mysql has a maximum of 100 queries
> at once. I didn't know this, and if someone goes to my homepage at the
> exact same time that someone else goes to the script that I mentioned
> earlier then those 2 people alone account for 9 queries! So, only
> having 22 people on the site at the exact same time will max out the
> database... but I might have 200 at one time visiting!
>
>
>
>
>
> Thanks, I'm going to check that out right now.
>
> Jason

Andy Dingley

2007-11-22, 6:16 am

On 21 Nov, 22:36, Jason Carlton <jwcarl...@XXXXXXXXXX> wrote:

> So I'm curious if any of you have experimented with bench times on
> queries, comparing multiple languages?


Language isn't usually the issue.

Unless you're inventing a radical new _form_ of database, the typical
website is just a thin layer of site-specific scripting laid on top of
pre-exisiting products for language engine, web server and database
engine. The scripting layer isn't enough of a slow-down to make a
complex site significantly quicker or slower.

If a "typical" site is slow, then it's caused by either _gross_ bad-
coding of the site layer (to a level that's probably structurally
ugly, not just inefficient), a language engine that's unusually slow
(some late '90s "easy builder" products that are now obsolete) or more
usually, inefficiencies in the way on which these pre-constructed
tools are being used: by which I obviously mean bad data modelling in
the database, or inefficient queries upon it.

If a "typical" web site is slow, it's almost always because the
database sucks. How the database engine is _used_, not the choice of
database engine.

Probably the favourite bad technique here is doing "database" work in
the script layer, not in queries on the database itself. Don't use
loops in the script language and repeated sub-queries on the DB when
you can use a single more complex query to retrieve eveything through
one single connection.

As for MySQL, then it's not my favourite platform because it's too
restrictive. However it is fast though. Bearing in mind that it's
still a hierarchical database engine underneath, not relational, then
provided that you haven't coded some smartarse Darth Codd relational
trick in the views, it ought to deliver for you.
rf

2007-11-23, 3:28 am


"Andy Dingley" <dingbat@codesmiths.com> wrote in message
news:b6fd6917-6adb-482e-b9f8-e48c82c517f4@g21g2000hsh.googlegroups.com...

> As for MySQL, then it's not my favourite platform because it's too
> restrictive. However it is fast though. Bearing in mind that it's
> still a hierarchical database engine underneath, not relational


Hmmm. Care to elaborate on that last point?

Seems rather relational to me. Disconnected tables joinable at will in
whatever way the user chooses. Indexes available of course to assist in
access/joining, but not actually necessary. Just what a relational database
is defined to be.

Compared to (IBM mainframe) IMS, for example which is truly hierarchical.
Well, part of it anyway.

--
Richard.


Jerry Stuckle

2007-11-23, 3:28 am

Jason Carlton wrote:
>
> I'm still relatively new to DBI on any real level, so I can't pretend
> that my application is perfect. I did create an index, though, and I'm
> assuming that the query automatically uses it since MySQL
> documentation never said anything like "type this to access the index
> specifically."
>


Maybe, maybe not. It depends on the query and what indexes you have
defined. Look up EXPLAIN in the mysql doc - it will tell you what's
going on with the query.

> I could cut the queries on one page in half if there was a better way
> to return the number of rows in a table. Right now in the PERL script
> (my most heavy application), I'm using these 2 queries separately:
>
> # Get Info to Show
> my $sth = $dbh->prepare("SELECT id, lastmodified, subject, firstname,
> lastname, size, shortdesc FROM $forum_subjects ORDER BY lastmodified
> DESC LIMIT 20 OFFSET $start");
> $sth->execute;
>
> # Get Number of Rows
> my ($length_filenames) = $dbh->selectrow_array("SELECT COUNT(*) FROM
> $forum_subjects");
>
>


If you use the mysql or mysqli interface, you can get the number of rows
returned directly (i.e. mysql_num_rows()).. But I don't use DBI, so I
don't know if there's a way through it.

>
> Unfortunately, I use all of the fields in the query above. I do have
> other queries (like showing a member's profile) that I can just pull
> out one or two fields at a time, but those queries don't have the
> overhead that the one above has.
>


The number of columns returned has minimal impact on query speed (unless
you're returning something like a 2Mb BLOB column).

>
>
> I've used Java a lot for software applications, but I've been hesitant
> about using it for web apps. I have a chat room on my site that's in
> Java, and at least once a day someone posts that it doesn't work (and
> it always leads back to a problem on their end).
>
> Are the apps you're using targeted to the general public, or a
> specific group that you can ensure has the right equipment? If you're
> targeting the public, then running 250-300 hosts on that machine is
> great! Mine is a dual Xeon machine with 4G of RAM, and running half
> that load but still having problems.
>
>
>
> You're right, I'm using CGI, not mod_perl. I've never had a machine
> that had mod_perl installed, although I've been thinking about setting
> it up and modifying it. Do you know, can I install it on an existing
> machine without having an impact on any of the existing applications?
>
>
>
> It's not, you're right.
>


Actually a connection pool (persistent connections in MySQL) is BAD for
web apps on MySQL. When you have a persistent connections, the machine
has to allocate resources for the maximum number of connections you
could ever possibly need all of the time, whether they are in use or
not. This can slow the machine down by using resources which are not
being used. With non-persistent connections, resources are allocated
only as required. Connections take a little longer, but use fewer
resources.

>
>
> Oh, no, when I say a particular thing is slow, I mean SLOW!
>
> When I first built this site in 2001, I had never used PHP so
> everything was either HTML, Perl, or SSI. I was able to import the
> header file with a simple SSI include or, in the PERL apps, a simple
> OPEN FILE command, but as the traffic grew I would see times that a
> page would take up to 2 minutes to load!
>
> I changed the static pages to PHP instead of HTML, and used a simple
> include() statement, and in the PERL scripts started using
> LWP::Simple. This solved the problem for awhile, but as of about a
> year ago, the traffic had quadrupled and I started seeing the same
> delay.
>
> I went through the entire site and hard-coded the header page onto
> each page, so now the header page isn't used at all. This seemed to
> resolve THAT problem, so I know that the delay was coming from a lot
> of accesses to the same header file. But, if I add something to the
> navigation, I have to do it manually on every page, which is pretty
> inconvenient.
>


That shouldn't have caused a significant slowdown. A simple
include(filename) or require(filename) is pretty quick in PHP, and if
the file were being used regularly, it should have been cached by the
OS. OTOH, if you had include('http://uri') it would be quite slow.

>
>
> I've always felt the same way, and I've never had a problem like this
> with any other site. The bottleneck has to be the database, but I
> simply don't see how it could be made to be any more efficient.
>


EXPLAIN your queries and see what's happening.

> I've also discovered recently that mysql has a maximum of 100 queries
> at once. I didn't know this, and if someone goes to my homepage at the
> exact same time that someone else goes to the script that I mentioned
> earlier then those 2 people alone account for 9 queries! So, only
> having 22 people on the site at the exact same time will max out the
> database... but I might have 200 at one time visiting!
>


This is a configuration parameter. But one person will only account for
one query at a time, and only while that query is being executed. If
your queries are efficient, you should be able to have several thousand
people connected at the same time without a problem.

>
>
> Thanks, I'm going to check that out right now.
>
> Jason
>


Look at comp.databases.mysql. A lot of mysql experts there will help
you out.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstucklex@attglobal.net
==================

Beauregard T. Shagnasty

2007-11-23, 3:28 am

Andy Dingley wrote:

> .. then provided that you haven't coded some smartarse Darth Codd
> relational trick in the views,


Ooh! I liked that one! Darth Codd. ;-)

--
-bts
-Motorcycles defy gravity; cars just suck
Andy Dingley

2007-11-23, 3:28 am

On 22 Nov, 11:20, "rf" <r...@invalid.com> wrote:

>
> Hmmm. Care to elaborate on that last point?


Not really, it would take days!

> Seems rather relational to me.


Of course it is. _Rather_ relational. They all are: according to old
Codd none of them (?) ever did manage to get it perfectly right as
regards his idealised relational model. However mysql is less close
than the big-player competition. Look at its history in particular,
and the severe restrictions mysql imposed upon you a few years back.
Even today though, you can construct models and queries that are far
from the hierarchical model, perfectly reasonable in a relational
sense, and run like a drunken pig on MySQL.

If you want mysql to perform for you, think hierarchically. If you
strongly foreign key things onto simple primary keys, then join
performance over this keyspace will be adequate (remember that a join
isn't the same thing as a foreign key). If you start asking for
arbitrary joins all over the place, it dies on its arse.
George L. Sexton

2007-11-23, 3:28 am

On Wed, 21 Nov 2007 20:48:43 -0800, Jason Carlton wrote:
> I could cut the queries on one page in half if there was a better way
> to return the number of rows in a table. Right now in the PERL script
> (my most heavy application), I'm using these 2 queries separately:
>
> # Get Info to Show
> my $sth = $dbh->prepare("SELECT id, lastmodified, subject, firstname,
> lastname, size, shortdesc FROM $forum_subjects ORDER BY lastmodified
> DESC LIMIT 20 OFFSET $start");
> $sth->execute;


I'm not a mysql expert (actually, I think it's worse than MS Access), but
from looking at this query, it appears to me that the plan might require
bringing ALL records into a temporary table/workspace, sorting them, and
then returning the set you're looking for. This is probably what's killing
you.

>
> # Get Number of Rows
> my ($length_filenames) = $dbh->selectrow_array("SELECT COUNT(*) FROM
> $forum_subjects");


This depends on the DB. I've seen some that require a table scan, and some
that can just scan the primary key index to return the result. Use the
EXPLAIN command to find out what's happening for you.


>
> I've used Java a lot for software applications, but I've been hesitant
> about using it for web apps. I have a chat room on my site that's in
> Java, and at least once a day someone posts that it doesn't work (and it
> always leads back to a problem on their end).


I'm talking about server side Java servlets using Apache Tomcat (or
IBM Websphere or BEA WebLogic. If you look at big sites, like the Weather
Channel, Home Depot, etc, this is the technology of choice.


> Are the apps you're using targeted to the general public, or a specific
> group that you can ensure has the right equipment? If you're targeting
> the public, then running 250-300 hosts on that machine is great! Mine is
> a dual Xeon machine with 4G of RAM, and running half that load but still
> having problems.


We run calendars for schools, churches, government agencies, businesses,
etc. It's a pretty broad spectrum. But, we're just running our calendar.

>
>
>
> You're right, I'm using CGI, not mod_perl. I've never had a machine that
> had mod_perl installed, although I've been thinking about setting it up
> and modifying it. Do you know, can I install it on an existing machine
> without having an impact on any of the existing applications?


I don't know anything about CGI or Perl. I needed a PERL program done, and
I had to hire someone to do it.


--
George Sexton
MH Software, Inc. - Home of Connect Daily Web Calendar
http://www.mhsoftware.com/connectdaily.htm
mynameisnobodyodyssea@googlemail.com

2007-11-23, 3:29 am

On Nov 21, 10:36 pm, Jason Carlton <jwcarl...@XXXXXXXXXX> wrote:
> I have a site with a peak traffic load in the neighborhood of 200
> accesses per second (roughly 45,000 unique visitors, and a million
> page views). Most of the features of the site are driven by MySQL, and
> the number of queries has caused a significant load on the server,
> resulting in VERY slow pages.


Besides optimizing your code for speed,
did you check your server access logs, awstats etc.
in case bots crawl unnecessarily some pages,
maybe you could improve your robots.txt file...
(just a suggestion)




Toby A Inkster

2007-11-23, 6:19 pm

George L. Sexton wrote:
> On Wed, 21 Nov 2007 20:48:43 -0800, Jason Carlton wrote:
>
>
> I'm not a mysql expert (actually, I think it's worse than MS Access),


Oh, come on... it's bad, but it's not *that* bad! It is fast though, which
appears to be the OP's main concern.

> but from looking at this query, it appears to me that the plan might
> require bringing ALL records into a temporary table/workspace, sorting
> them, and then returning the set you're looking for. This is probably
> what's killing you.


It shouldn't do. For this query, the OP should ensure that there's an
index on the "lastmodified" field.

>
> This depends on the DB. I've seen some that require a table scan, and
> some that can just scan the primary key index to return the result. Use
> the EXPLAIN command to find out what's happening for you.


Assuming that lastmodified is indexed, and not null, then perhaps

SELECT COUNT(lastmodified) ...

>
> I don't know anything about CGI or Perl. I needed a PERL program done,
> and I had to hire someone to do it.


mod_perl should give you a *big* speed up.

Assuming that your OS's package manager can take care of the installation
your you, it shouldn't take more than 20 minutes to figure out how to tell
Apache to process PERL files through mod_perl instead of CGI. And mod_perl
can run most CGI PERL scripts without modification. (The reverse is not
necessarily true!)

--
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 16:49.]

It'll be in the Last Place You Look
http://tobyinkster.co.uk/blog/2007/11/21/no2id/
Sponsored Links


Copyright 2003 - 2008 forum4designers.com  Software forum  Computer Hardware reviews