Archive for the 'The Web' Category

Scalling with YouTube’s oracle algorithm for speeding up writes

Friday, May 4th, 2007

Paul Tuckfield from YouTube presented at the mySQL conference. He devised the oracle algorithm to speed up MySQL replication.

The short of it is pretty simple. Scan the replication log then create/execute SQL queries to ensure to-be-altered data is already in memory on the slave.

JQuery and Fisheye how cool is that

Monday, April 23rd, 2007

I’ve been hearing good things about JQuery. After YUI group posted a presentation by John Resign the creator of JQuery I decided to take a look.

Right away I found a cool plugin called Fisheye. I whipped up a version and posted it here.

http://caldergroup.com/fisheye.html

Amazon Prime 4% penetration

Thursday, March 15th, 2007

According to this Forrester’s Blog Amazon Prime is only used by 4% of Amazon customers. About half of those customers order a lot the other half don’t order that much (and they’ve forgotten to turn it off).

A 4% conversion is pretty bad for a loyalty program. If I started a loyalty program I would want a least 20% conversion.

I’d like to know what Peet’s Coffe & Tea has for adoption of their debit card. I checked their 10K’s, but didn’t find anything. Top secrete I guess.

Google Maps is Easy

Saturday, March 3rd, 2007

Turns out that Google maps is really easy to use. First you need a key, just sing up and get one. Then you can start showing maps.


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<head>
<script src=”http://maps.google.com/maps?file=api&v=2&key=!!MYKEY!!” type=”text/javascript”></script>
<script type=”text/javascript”>
//<![CDATA[
function showmap(geocode) {
if (GBrowserIsCompatible()) {
if ( null == geocode )
geocode = new GLatLng(37.4419, -122.1419)
var map = new GMap2(document.getElementById(”map”));
map.setCenter(geocode, 13);
}
}
//]]>
</script>
</head>
<body onload=”showmap()” onunload=”GUnload()”>
<div id=”map” style=”width: 500px; height: 300px”></div>
</body>
</html>

Now if you like, you can add two more js functions to lookup new addresses, get their lat & lng and show the new map. The function getLatLng will execute the callback function workongeocode with a geocode as the arg.


function lookupaddr(address) {
var geocoder = new GClientGeocoder();
return geocoder.getLatLng(address,workongeocode)
}
function workongeocode(geocode) {
if ( null == geocode )
return
var divReplace = document.getElementById("latlng");
newLi = document.createElement("li");
newLi.innerHTML = geocode.lat() + " " + geocode.lng();
if ( null == divReplace.childNodes[1] ) {
divReplace.childNodes[0].appendChild(newLi);
} else {
divReplace.childNodes[1].appendChild(newLi);
}
showmap(geocode)
}

Now just add this html/js after the <div id=map …. >


<div id="myaddr">
<input type="text" name="thisaddr" size="50" />
</div>
<button onclick="lookupaddr(document.getElementById('myaddr').childNodes[1].value)">Get Lat/Lng
<div id=”latlng” >
<ul>
<li> first!</li>
</ul>
</div>

There you go, easy right?

Web App Architectures Part 2

Sunday, August 13th, 2006

Multi-service web platforms represent architectures which have several distinct functions. Sometimes these functions are tightly integrated together, sometimes they are loosely coupled services and sometimes they are both.

I really dislike the term N-Tier architecture. An N-Tier architecture is a tightly integrated set of distinct functions which are usually separated onto different tiers of machines. An N-Tier implies more than 3 tiers. To me it hearkens back to the dark ages of build it all yourself and over engineering. I still get asked how much experience I have with N-Tier architectures, so I suppose someone cares about them. I always hope its a trick question, and my response is not much.

Diagram of N-Tier
NTier Application Stack

Service Oriented Architectures (SOA) mix it up a little bit, by taking a defined business or technical area and creating a separate application space. Many SOAs can live in the same container, or a single SOA can live on its own tier of machines.

Diagram of SOA Web Stack
SOA Web App Stack

In a tightly integrated stack you might see the following

Data and Query Layer - go get data from multiple sources, like your stock portfolio information from the user database, current stock price from the stock database, and a real time query to the clearing-house mainframe to check the status of your trades.

Data Object Layer - The different queries are marshaled into objects, often the object is implicated tied to it’s data cache

Business Logic Layer - Special rules but distinct from the data logic in the data layer. In the worst case there may be several distinct layers of business logic.

Presentation Layer - special controls for surfacing different workflows and different look and feel.

In a loosely integrated stack, you are more likely to see the layers combined. So the Data Object Layer will not be separated from the Data Queries, and it may not even be separated from the Business Logic. With a loosely coupled system, lots of small vertical stacks are created, which isolates changes and promotes common APIs.

Without the need for tight integration, the need for an complex architecture disappears. The simplicity from a loosely coupled system is the result of pushing the dependencies onto the consumer of the service. Most consumers have simple needs especially in the beginning. With a tightly integrated stack the dependencies are build into the system, and consumers are given a complex system, in an attempt to encapsulate their needs and make the dependencies invisible.

Web App Architectures Part 1

Thursday, August 10th, 2006

I enjoy blogging, but I often talk about little tech and process tidbits that interest me. This week I’m in Seattle, and I’ve had the opportunity to talk to a lot of folks starting online businesses, building online businesses and running online businesses.

The interesting thing in Seattle is its a Microsoft town, yet all most all of the business managers and founders of new online companies want tech people skilled in Open Source and Linux. This desire is driven by the feeling that open source is faster, more flexible, and more on the cutting edge. I agree with that, but I also see a lot of bloat, and too many tech choices to make. Just think of how may Java XML parsers there are ( Jibx, PPP, JBO).

The problem is lots of opinions about better technology are throw out there, with no backdrop for comparison. Often technologies are discussed and evaluated for their unique or expressed purpose, but they need to be evaluated for how they fit into the overall stack of software.

At a 10,000 foot view a basic web app has three parts
Data - some structured information
Presentation - some web application to display the data
Manipulation - some apps to manipulate or personalize the data

Basic Web Architecture

This is a basic web app, which should cover more than 90% of all the sites out there. This breakdown isn’t going to describe everything, but lets save something for part 2 & 3.

The simplest representation of this is a flat HTML page.
Presentation - the HTML page and an Apache server
Manipulation - the tool would be a text editor like vi
Data - the file system.

A more complex system would have capabilities like user generated content, search, administration tools, meta data, co-branded pages, and ad delivery. I’ll admit not every app falls cleanly into my three over generalized buckets, but this is just a basic web architecture which should describe almost all the sites out there.

Presentation
——————-
Search
Co-brands
Feeds
Google Maps
Ad Delivery
Reading Blogs

Manipulation
——————–
Editorial Tools
User Facing Tools
Writing Blogs

Data
———
Databases
File systems
In Memory Caches

Things that don’t fit in well are integrated APIs, where reading and writing are done via the same interface. The same application may handle both, and separating the two functions would be silly.

Messaging platforms (ie JMS) don’t really fit in either.

Like I said before, this isn’t an attempt to describe all web architectures, just a the most basic variety.

The Best Open Source Searching Platform

Monday, July 10th, 2006

Erik Hatcher uses Solr should that be good enough? He uses Ruby on Rail and Solr in conjunction to support http://www.nines.org/.

So what is Solr?

In a nutshell Solr is a wrapper around Lucene which provides all of Lucene’s functionality as a web service.

Here is the description from Apache-Con
Apache Solr, a Lucene based full-text search server, with XML/HTTP interfaces, declarative specification of data types and text analysis with a schema, extensive caching, index replication, and a web admin interface. Solr is optimized for high volume low latency web traffic and has support for faceted browsing and dynamic results grouping.

Whats so great about Solr?

  • Replication: Solr can copy itself and still guarantee read access. Very nice for high availability and scalability
  • Language Independent: Solr makes searching with Lucene a web service, now you can access a Lucene collection in any language.
  • Support for Faceted Searching: Solr provides the infrastructure for faceted search with a plug in module and open bit-sets

Where does Solr shine

Solr works well with highly structured schemes, where the data stored is know well. Solr works great as a full text searching engine, which an update capability. It doesn’t work well as a front end database, the complexity of managing the system for rapid updates is just too hard.

Nutch is another application which uses Lucene, IMHO it looks like a great tool to support unstructured search when your need to scan through and index a lot of different documents.

Red-Piranha is another project which may interest some, but I’m not sure its active any more :(

Optimize Your Site

Tuesday, June 27th, 2006

I was surprise to find people actually reading my blog. Especially a poorly written post on Offermatica’s dominance.

So here is another post on some basics tenets of running tests. The goal: provide users with a consistent view of the site while exposing different variants of the site to different people.

Users Must Have A Consistent Experience

Within a session, a user must have the same experience. If you expose a users to two or more variants within a single session you will ruin the test. There will be no way of separating activity by variant. You may also create an unworkable navigation.

For example, lets say you decide to make a brighter background on some google ad words. If the users are split into two groups, you may simple compare the conversion rates of each group. In addition users will consistently see the same background and get the same “message”. If you randomly serve up different backgrounds, users will see ad words in two different presentations.

If a user sees both “messages” it can be very difficult to figure out which “message” should be credited with the conversion.

Good SEO Means a Single Variant

Remember all that hard work you put into optimizing your site for search engines? With some poorly planned multivariate testing it can all go down the drain.

Ya see, if search engines find more than one URL with almost the same content, that URL is viewed as a link farm and the page rank plummets. If your testing creates different URLs for different variants, try using robots.txt to block search engines from seeing your “experimental” pages.

The other problem is rapid changes. To the best of my knowledge, if the agents from a search engine see multiple version of the same page, they will only take one version. With each new version no one is sure what happens to the pages rank, it may go up it may go down. Sometimes these changes take 2-3 weeks to digest, and problems show up after testing is complete.

In this case filtering by user agent is a good idea. Send agents with “bot” in their UA string to a nice stable site. In addition, at this point search agents don’t look at java script. With DOM manipulation via java script your site is safe too.

Offermatica Needs a Competitor

Wednesday, June 21st, 2006

Offermatica http://www.offermatica.com/ is a great piece of software. Their product does multivariate testing as a web service. For example you could pick 5 different background colors for you text ads and find out which performed the best.

Little changes like this made a huge difference. I’ve seen click through rates go up 8 times with a different background color. I’ve seen SMALLER buttons with new spicy graphics improve click through by 20%.

Problem number 1
Offermatica does have its problems. Lets start with the technology. I have no problem with the overall architecture. In fact, I would pretty much build the same thing if I had the technology reigns, no the problem is scalability. Offermatica doesn’t scale for large websites. Don’t send Offermatica more than a few million hits a day. Keep it around 2-4 million. Their platform is new technology it shouldn’t be that hard to scale, but Offermatica is more interested in sales than performance.

Problem number 2
There are many types of conversion, but just about every website will be interested in following a multi-step, mulit-page process. Offermatica doesn’t do a good job of tracking changes through multiple user interactions. Offermatica does work well with multivariate testing on a single page. So if you want to figure out if the “blue” background makes shoppers buy more widgets, you have quite a big of work cut out for you. This too could be improved, but I’m not seeing any real changes in the technology as of late.

How Does it Work
Well Offermatica askes you to insert a special div tag on your page, with default HTML inside of that div tag. You then insert a URL call which loads javascript functions from Offermatica.

At page load time, the user is issues an Offermatica cookie, and Offermatica changes the HTML inside of the div tag based on a has of the user cookie.

Double AB-Testing
Since Offermatica can’t handle huge volumes of traffic, large volume sites end up splitting there traffic and sending Offermatica a small, yet significant fraction. This amounts to site running its own A-B test sending, on top of Offermatica’s own tests.

Martin Fowler and I agree

Monday, May 22nd, 2006

In a recent post Mr Fowler and I agree.

He breaks down code ownership, one of my favorite topics into three segments, with Strong Ownership being his least favorite.

  • Strong Ownership
  • Weak Ownership
  • Community Ownership

I agree, Strong Ownership doesn’t work. Sure people need to be held accountable for the code they right, but having an individual responsible for a section of code makes no sense. It doesn’t even pass the hit by a bus test.

Community Ownership has always worked well for the projects and teams that I’ve managed. The team really comes together to put out a the right product, and they make some high quality code. There is something to be said for peer pressure, you just don’t want to let down your team.