Archive for the 'Software Development' Category

Peer code review tools for java

Tuesday, May 8th, 2007

Everyone says code reviews are great, but I’ve always had trouble getting them off the ground. I spend some time looking for tools to assist in peer reviews. Usually I’m the first to say buy it, but with commercial tools costing up to $400 a seat who want to buy a tool to manage a process which may not even exist.

Tracs has a plug-in to do peer code reviews. Thats pretty cool, as Tracs is widely used.

I would love to get my hands on Mondorian but thats not going to happy with all that proprietary google code.

Code striker looks like a decent peer review tool. Its very active with the last release in Mach 2007.

Finally in a different category is Quilt a code coverage tool. It says optimized to work with ant and Maven and jUnit so I’ll have to check it out.

update: I found one more code coverage tool Emma . It is supposed to be very easy to use and set up.

Scalling with YouTube’s oracle algorithm for speeding up writes

Friday, May 4th, 2007

Paul Tuckfield from YouTube presented at the mySQL conference. He devised the oracle algorithm to speed up MySQL replication.

The short of it is pretty simple. Scan the replication log then create/execute SQL queries to ensure to-be-altered data is already in memory on the slave.

MySQL One Large Table is better than many small tables

Friday, May 4th, 2007

With MySQL one large table is almost always faster then many small tables. This is attributed to MySQL’s exclusive use of a nested join algorithm. Check out Peter Zaitsev comment at the end of the page.

The nested join algorithm has to loop through the inner and outer elements of the loop to find matches. The more joins, the more nested loops. So one large table is almost always better.

Of course, a huge de-normalized table has some problems, namely data duplication. If you have a list of people with addresses, changing something simple like a zip code or a city name is a pain. You have to scan the whole table changing multiple rows. A db in 3rd normal form would enable a zip code or city update of just one row.

Top 5 Skills for Online Media Engineers

Monday, April 30th, 2007

I’ve been reviewing a lot of resumes for contract positions at several of the 10 ten online media companies. Its interesting to see the common set of java skill develop. So if you want to be a java developer here are the skills the market is presenting.

  • Struts MVC - the MVC everyone uses, inhouse MVC are dying out replace by open source versions
  • AJAX - everyone has this on their resume, but everyone means form validation without reloads (see Struts above)
  • XML - its the new way to write configuration files
  • JSP - This means understanding the servlet lifecycle and the ability to do gunt work by adapting the web pages for constant updates
  • CSS - everyone knows the basics of CSS which is good, big change from 2 years ago
  • JSTL & Tags - shows up on almost all resumes and everyone has a different level of experience
  • ORM & DAO - anything goes, lots of inhouse code which people are afraid to change
  • Spring - but only as an IOC container for Hibernate and the occasions bean configuration management

To me the interesting thing is the lack of variety. My favorite question so far has been:

Q: “Which XML parser did you use?”
A: “DOM & SAX”
Q: “Oh sorry let me rephrase, which implementation of an XML parser did you use?”
A: “DOM & SAX”

Its always the same! I’m not sure how exactly people are doing XML parsing with out knowing the implementation. I must be missing some super easy parsenow function in java. A quick google shows that even the Sun tutorial lists the XML parsers

What interests me the most is how shallow the tool box is. Take the MVC layer. I expected more variety like Wicket or Struts2. Spring is a good solution, but it doesn’t go far enough (especially in the view and model departments). Additional packages like Tiles and Sitemesh make a big difference.

Spring is a nice and people should use the configuration management and aspect oriented features more.

No one seems to have any knowledge of javascript libraries, which is really a shame. Pick a simple one and learn it. Its easy and lots of fun. I recommend script.aculo.us or JQuery. Exhibit is a standout facited browser written in javascript.

Lastly would be search. Search is a key skill for an online media company. Pick a searching technology and learn it. I suggest Solr, but MySQL full text search is a simple alternative. Egothor is all java and interesting, but I haven’t used it. A nice little project would be to create a simple collection and use scriptaculous to make suggestions as you type.

With all the good stuff out there why doesn’t more of it show up on resumes?

Grails, Trails, Sails, Slingshot

Sunday, April 22nd, 2007

There is a sponsored listing for Java BOF session comparing Grails, Rails, Sails, and Trails.

Its all a bit much for me, but there is a real need for these frameworks. Lots of companies are using Java (although Charles Shwab is moving to Microsoft .NET). Lots of online media companies are using Java. If your are an online media company with a Java Architect you might be in for a rough ride. Java Architects create a build it inhouse, meritocracy of java code. That in turn creates a confusing jumble of choices. Much better to have a small well worn tool kit, with an emphasis of picking up new tools at the expense of older ones. Listed below are 4 java frameworks, if your java stack is in need of repair rule take a look at each of the following.

Kettle the original opensource ETL

Thursday, April 19th, 2007

Pentaho aquired Kettle ETL tool. I never used Kettle myself. I only saw it in action. It was pretty nice. Much better than something I would have developed myself. Kettle is all java, which makes it easy to extend.

There are other open source ETL tools out there. Maybe there is a diamond in the rough there, not much activity in the last year or two on all the ones I checked out.

In addition to batch ETL actively I would love to see an ETL tool to pump data in small batches or as a constant stream of messages. That would be a useful feature for populating search collections, populating aggregates grouped on slowly changing dimensions, or keeping legacy/new systems in-sync before a cutover.

10 reasons not to use rails

Wednesday, April 18th, 2007

Well I posted this before, but it disappeared . Here are 10 reasons you may not want to use rails

1) The learning curve is huge. Just the other day I wrote my own version of cycle I also learned how to inject javascript from a controller.
2) SDLC is broken for teams. Yes Rails is great for a team of one, but version control and deployment is monolithic. For example if two developers independently work on two different controllers with two different views there is no way to version and deploy these components separately.
3) Yet another deployment system. Rails has gems and plugins. The rest of the world uses rpms, debs, wars, and whatever windows uses.
4) Ruby is green threaded. One server is slow you’ll need many-many servers to handle the load. Good thing virtulization is taking off.
5) ActiveRecord breaks down when you want to pivot rows into columns. This use case comes up with catalogs with many topics, each topic containing its own fields. Luckily there is RBatis.
6) The code base is unstable. The main code line and the gems are constantly evolving with little consideration for backwards compatibility. Combine this with poor deployment managment
7) Multi-lingual support is broken. Multi-byte characters aren’t supported, so stick to one language!
8) Rails web containers are unstable. There are many choices for a web container webbrick, mongrel, and lighthttp to name a few. None of them work as well as Apache. The servers get wedged. Apache with Fast CGI doens’t seem to work and those who start using it move on to something else.
9) The code is obtuse and you’ll need to read it. If you want to understands how to do things you’ll need to read the code.
10) The conceptual documentation is missing. Its a framework, yet the concepts go unanswered. For example, what exactly are restfully resources for and why isn’t there a client library?

Web App Architectures Part 2

Sunday, August 13th, 2006

Multi-service web platforms represent architectures which have several distinct functions. Sometimes these functions are tightly integrated together, sometimes they are loosely coupled services and sometimes they are both.

I really dislike the term N-Tier architecture. An N-Tier architecture is a tightly integrated set of distinct functions which are usually separated onto different tiers of machines. An N-Tier implies more than 3 tiers. To me it hearkens back to the dark ages of build it all yourself and over engineering. I still get asked how much experience I have with N-Tier architectures, so I suppose someone cares about them. I always hope its a trick question, and my response is not much.

Diagram of N-Tier
NTier Application Stack

Service Oriented Architectures (SOA) mix it up a little bit, by taking a defined business or technical area and creating a separate application space. Many SOAs can live in the same container, or a single SOA can live on its own tier of machines.

Diagram of SOA Web Stack
SOA Web App Stack

In a tightly integrated stack you might see the following

Data and Query Layer - go get data from multiple sources, like your stock portfolio information from the user database, current stock price from the stock database, and a real time query to the clearing-house mainframe to check the status of your trades.

Data Object Layer - The different queries are marshaled into objects, often the object is implicated tied to it’s data cache

Business Logic Layer - Special rules but distinct from the data logic in the data layer. In the worst case there may be several distinct layers of business logic.

Presentation Layer - special controls for surfacing different workflows and different look and feel.

In a loosely integrated stack, you are more likely to see the layers combined. So the Data Object Layer will not be separated from the Data Queries, and it may not even be separated from the Business Logic. With a loosely coupled system, lots of small vertical stacks are created, which isolates changes and promotes common APIs.

Without the need for tight integration, the need for an complex architecture disappears. The simplicity from a loosely coupled system is the result of pushing the dependencies onto the consumer of the service. Most consumers have simple needs especially in the beginning. With a tightly integrated stack the dependencies are build into the system, and consumers are given a complex system, in an attempt to encapsulate their needs and make the dependencies invisible.

Web App Architectures Part 1

Thursday, August 10th, 2006

I enjoy blogging, but I often talk about little tech and process tidbits that interest me. This week I’m in Seattle, and I’ve had the opportunity to talk to a lot of folks starting online businesses, building online businesses and running online businesses.

The interesting thing in Seattle is its a Microsoft town, yet all most all of the business managers and founders of new online companies want tech people skilled in Open Source and Linux. This desire is driven by the feeling that open source is faster, more flexible, and more on the cutting edge. I agree with that, but I also see a lot of bloat, and too many tech choices to make. Just think of how may Java XML parsers there are ( Jibx, PPP, JBO).

The problem is lots of opinions about better technology are throw out there, with no backdrop for comparison. Often technologies are discussed and evaluated for their unique or expressed purpose, but they need to be evaluated for how they fit into the overall stack of software.

At a 10,000 foot view a basic web app has three parts
Data - some structured information
Presentation - some web application to display the data
Manipulation - some apps to manipulate or personalize the data

Basic Web Architecture

This is a basic web app, which should cover more than 90% of all the sites out there. This breakdown isn’t going to describe everything, but lets save something for part 2 & 3.

The simplest representation of this is a flat HTML page.
Presentation - the HTML page and an Apache server
Manipulation - the tool would be a text editor like vi
Data - the file system.

A more complex system would have capabilities like user generated content, search, administration tools, meta data, co-branded pages, and ad delivery. I’ll admit not every app falls cleanly into my three over generalized buckets, but this is just a basic web architecture which should describe almost all the sites out there.

Presentation
——————-
Search
Co-brands
Feeds
Google Maps
Ad Delivery
Reading Blogs

Manipulation
——————–
Editorial Tools
User Facing Tools
Writing Blogs

Data
———
Databases
File systems
In Memory Caches

Things that don’t fit in well are integrated APIs, where reading and writing are done via the same interface. The same application may handle both, and separating the two functions would be silly.

Messaging platforms (ie JMS) don’t really fit in either.

Like I said before, this isn’t an attempt to describe all web architectures, just a the most basic variety.

Money Saving Tips on Hardware

Wednesday, July 19th, 2006

Luckily most of you aren’t buying servers or trying to stretch your hardware budget. For those of you who are, here are some tips on saving money.

Actually there is only one tip, buy as little as possible from the original manufacture. You see the original manufacture like IBM or HP has pricing power, and they will charge you an arm and a leg for their brand name products.

Here is how you apply these rules

  • By the most minimal configuration from the manufacture
  • Use a VAR for the purchase and buy additional items like memory and disk drives from the VAR
  • Have the VAR do the installation of the memory and disk drives

So for example, if you wanted to buy a HP-DL385, dual opteron with 12Gb of memory and 3 72Gb drives don’t buy it all from HP! Get an HP-DL385 with 2Gb or memory and one tiny drive. Then have the VAR purchase the additional memory and drives, install them and ship them to you.

Shop around and find some good VAR. They are competitive, and unlike the manufacture they don’t have pricing power.

This won’t work for all your hardware needs. Take a HP-DL585 for example. It can take up to 64Gb of memory, but only specially certified memory from HP may be used.

This is why you can get a 2 way HP-DL385 with 16Gb for $7,000, but a 2 way HP-DL585 with 64Gb will cost at least $68,000. Four times the memory nine times the cost.