Agility Gigs Podcast – Passion & Purpose

I was recently invited to take part in a podcast where we discuss passion and purpose at the organisational, team and individual level. It was great fun and I invite you to listen and share your thoughts in the comments.

Taking part were:

Posted in Uncategorized | Leave a comment

The Quality of the User Experience

In the Agile world we have all heard about how we build in quality up front with each feature developed, how we prioritise fewer features to greater quality rather than building on shaky foundations, the definition of done and a whole host of other mantras by which to ensure we build the right software.

What I want to talk about here is what defines a good user experience and how the concept of quality feeds right into this.  By user, I am referring to any human interaction with a software product, although a lot of the principles apply to clients of APIs (see Developer Experience (DX, link to Ismail’s blog)).

First, let’s define what a good user experience is.  It is divided into two categories:

  • How easy it is for the user to accomplish their desired outcome
  • How well the application responds and informs the user of the result of their actions

The example we will use is that in stock and options trading where a Market Maker builds an application that allows a Trader to trade futures, stocks and derivates such as options.  This is a good example because not only do Traders need to be able to navigate the application easily to quickly enter orders, but they must also have up to the second prices so they can make decisions based on fresh data regardless on the level of activity in the markets.  Indeed, delayed prices make the application unusable and the Trader will probably leave that Market Maker and go to another one.

The user and their desired outcome

I won’t spend much time on this aspect as it has been covered in literature elsewhere many times before.  Suffice to say that it is primarily driven through good Human Computer Interaction principles applied to the UI, the way the information is laid out specific to the user’s locale and way of working, clarity of information displayed and ease of knowing what to do next to name but a few.  In our case study, the Market Maker builds and makes available an application that needs to make it easy for a Trader to model their trade, preview their order, submit the order, receive confirmation and monitor their positions as prices change

How well the application responds

This aspect is one that has as much, if not greater, importance to a user’s experience.  What we are talking about here is how well the user can achieve their desired outcome while the system experiences a high level of stress, unexpected failures in sub-systems etc.  As a fantastic example, think about the classic market open problem faced by all stock exchanges around the world. For nearly all types of trades, particularly intraday trading and scalping, traders expect a sub second response to their actions regardless of how many orders are going through their application and the exchanges at any one time.  It could be a relatively quite period like a bank holiday when a trader is engaged in out of hours trading with very low volume at one extreme, all the way up to a catastrophic event that results in needing to liquidate positions immediately (think the 2008-2009 market crash).  Traders pay a market maker commissions for each trade so the Market Maker makes money no matter what the outcome of the trade.  The more trades the Market Maker can get through, the more money they make.

Putting it all together

What we have discussed so far is nothing new, however, the key aspect when developing software is having the entire development team and management understand what it takes to develop a feature.  Just because a feature is demonstrable via a UI does not mean it is done.  Far from it, this is just the tip of the iceberg.  The majority of the work remains to make sure that the feature remains as usable with one user as it does with a million (or whatever the SLA is) and functions as expected in the wake of unexpected failures – i.e. is responsive and resilient – as well as other aspects such as security, availability etc.  To achieve this, there are many software engineering principles and architectural techniques (a great one is the Reactive Manifesto) and it is the responsibility of every Product Development team to ensure that when Feature X is Done, that it means it:

  • Allows the user to achieve a desired outcome in an intuitive manner
  • Remains responsive to the user even in the event of a failure in one or more subsystems
  • Communicates to the user if an action cannot result in the desired outcome in a clear manner
  • Has had other recommended practices applied, e.g. implementing good security procedures to ensure that sensitive data is not compromised

Most of the work for a Product Development team is in the second category of how an application responds under different scenarios. Get this right and additional features will be able to leverage that investment, create a much better product for users and bring in more revenue even when there is a huge surge in activity.  When an application remains responsive and informative, users achieve what they want resulting in higher customer satisfaction, and the company makes more money.  A win-win!

Posted in Agile | Leave a comment

In Search of a Reactive Framework (or: How we select new technologies)

Note: In this blog, the term “technology” is interchangeable with “framework” as the same principles apply.

I am currently working at Ticketmaster helping in a large scale Agile transformation and Platform migration.  About seven or eight months ago we started looking at new and improved ways of creating services instead of the tried and tested Spring framework based approaches common in the organisation. The Spring framework certainly has its merits and we have used it with much success, however, the service we were about to write, called Gateway, was to route millions of requests to other services further down in our architecture layers:

GatewayDiagram_v1

We knew that the service had to run efficiently (thereby saving on hardware) and scale effectively. The event driven, reactive approach was one that we were looking to embrace. After much research we had some concerns. Not only is that space full of different frameworks at different levels of maturity, but we also had to consider our current skill set, which is predominantly Java with a small amount of Scala, Ruby, Clojure (which our data science guys use) and a handful of other languages we’ve picked up through company acquisitions. How could we adopt this new paradigm in the easiest possible way?

What this blog post will detail is the approach used to select the chosen framework. It’s a tried and tested approach I’ve used before before and will continue to use and improve upon in the future.

How we did it

Before we describe the stages of what we did, suffice to say that there is no point in doing a technology selection without a business context and an idea of the service(s) that technology will be used to build.

The technology selection was broken up into the following steps:

  • Identify principles – these are the rules that the technology must adhere to. They are properties that a technology can meet in different ways and contain a degree of flexibility.
  • Identify constraints – these are rules that cannot be broken or deviated from in any way. If any are broken, the technology is no longer a candidate.
  • Create a short list of 5 – 10 candidate technologies.
  • Determine high-level requirements and rank using MoSCoW prioritisation
    • Read the documentation, Google groups and other relevant articles and trend data to determine conformance to the requirements, including all options and workarounds if applicable.
    • If any of the Musts are broken by a candidate then it is dropped.
    • Create a short list of, ideally, at least three candidates (I’ve found three is generally a good balance between variety and available time to go into depth, though you can modify this depending on the criticality in your stack of the technology being chosen).
  • Create a second set of more detailed requirements, rank using MoSCoW and weight in terms of importance
  • Determine the architecturally significant business stories and error scenarios that the service to be built from the technology needs to implement:
    • Write the end-to-end acceptance tests for these stories. The primary scenario with one or two error scenarios is sufficient.
    • Implement these end to end acceptance tests in all three technologies – this will give an idea of how well the technology meets the service’s paradigm(s), how easy it is to work with in the develop/test cycle and also make sure to post on the message boards or mailing lists to see how quickly a response arrives from the maintainers.
    • Update the second set of more detailed requirements with the results of this experience

Our results are pretty detailed so have been added into a separate PDF that you can download here. We have left this in raw format and hope that they will be a good reference for others.

Outcome and experiences to date

As you can see from the PDF of results, we chose Vertx. It won out not only because of its raw power, but because of it’s fantastic architecture, implementation, ease of use, Google Groups support and the fact that Red Hat employs a small team to develop and maintain it. Indeed, a few weeks after we selected it, it was announced that Red Hat hired two more engineers to work on Vertx.

So overall we have been very happy with our selection of Vertx. We had version 2.1.5 running in production for several months and recently upgrade to Vertx 3. The maintainers’ swift response on the Vertx Google Group definitely helped during our initial development phase and during the upgrade to version 3. Performance wise, the framework is extremely fast and we know that any slow down is most likely due to what we have implemented. Adoption has been a success. From a team of two developers, we scaled to four and now eight.  It is also starting to be used for other services.  Choosing a Java based framework has been a boon as the only additional complexity that needed to be learned by the developers joining the team was the event driven nature of Vertx (i.e. the framework itself). Had we chosen Scala/Play it would have been much harder. Indeed, with the success of Vertx, our decision to standardise on the JVM as a platform and our embracing of the reactive approach, we have a couple of services being built using Scala and at least one using Scala/Play. It would be great to hear of your experiences using reactive frameworks. Which ones did you choose? How easy were they to adopt? Please leave a comment, below.

Note: This is a slightly modified cross post of a blog that originally appeared here.

Posted in Architecture, Technology | Leave a comment

Leadership and inspiration momentum: The gap in Agile transformations

Over the last 16 years in the software industry, I have been involved in a number of Agile cultural transformations.  It was at lunch the other day during a conversation that I articulated an aspect of those transformations that is as true today as it was at the first transformation I was involved in.

During any cultural change, and here I am specifically talking about Agile cultural transformation although this may apply to other types of organisational change, there must be Leadership and Inspiration.  These go hand in hand and must come down from senior executives.  This is nothing new.  However, once the presentation has been given, once the direction has been set, how do we maintain momentum among the many teams that are involved in making the transformation a success?  The importance of this is directly proportional to the size of the organisation.  The more teams, the more people involved, the harder it is to maintain momentum.

So how exactly do we maintain momentum?  There are many options out there so let me present you with the two that I believe are most important.

Senior executive leadership, vision and drive

I want to mention this here for completeness.  There are many external sources that address this, so I will not here.  Let’s just note that this is the starting point for any successful transformation.  It is not a set and forget, rather a continued re-visit of the original strategy laid out, noting any modifications along the way, impact on the business and, most of all, impact on the people involved.

On the ground daily leadership

Here is where I think there is a gap in the industry’s thinking.  Twelve years ago when I took my first technical lead position during an Agile transformation, it was quite clear that there was a gap in the market.  That person that could take the Agile transformation directive and translate it to on the ground, practical application.  This is still true today. Organisational transformation literature calls these people Champions of Change.  In an Agile transformation, these Champions of Change need both the soft people skills and the hard technical skills required to both translate the transformation vision from strategy to practice and to earn the respect of their peers.  These key people are Technical Agile Mentors (TAMs).  Here are some examples of where I see these TAMs having great leverage during the day to day execution of the transformation:

  • Help the teams and their team members understand the vision at a practical level.  Continually assist in clarifying for those that need it and demonstrate by example through execution style or technical implementation. Deliver the features as a member of the team.
  • Take the opportunity wherever possible to show how a team can be better and align with the business goals:
    • During planning meetings: Explain how that team’s strategic objectives fit into the overall picture.
    • During daily execution: Take the opportunity to push the teams further using, for example, more proficient tools, techniques and, most importantly, execution behaviours.
    • During retrospectives: An ideal time to review the team’s current ways of working, suggesting improvements that fit into the overall strategy.  Take time to explain how the modification of the style of execution, introduction of a framework/tool or technique can bring benefits to the team and into the overall business goals.  Take the opportunity here to continually adapt an introduced change to fit the team’s dynamic.
  • Actively take part in training and changing the behaviours of team members, both senior and junior.  Their success is the TAM’s success.  As the software industry progresses, taking an active interest in the capabilities and development of each software engineer benefits both the organisation (through better quality software, more rapid feature delivery etc.) and the engineer (execution capability, technical skills etc.).

Filling the gap

So it is clear that for a successful Agile transformation to have the best possible chance of success not only do we need strong leadership at the executive level, but strong leadership at the team level.  This is the gap that is filled by the TAM.  This team level leadership in the context of an Agile transformation that a TAM brings requires a unique blend of people and technical skills.  Further, they must be current in both aspects of those skillsets as well as being visionaries for that team and across teams.  Given the nature of engineers and the rapid advancements in technologies, these people are difficult to find.  But take the time to choose them wisely because their impact can be profound.

Posted in Agile, Software Execution | Tagged , | Leave a comment

Unnecessary complexity


“Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius — and a lot of courage — to move in the opposite direction.” – Einstein

Software development is hard.  That’s a fact.  Not only do we have a product that we must enhance, develop from scratch, or even maintain, but we have people’s interactions, opinions, emotions, the list goes on.  And then there is design, technology and execution.   All these factors, all these variables make software development the fantastic challenge that it is.  And I love it.  At the same time, I am on a mission to remove complexity introduced into a system when it need not be there.  The actual doing of this, making the decisions about what to include, what to leave out and to do this consistently on a day to day basis is difficult.

As Product Owners and Engineers, we are faced with these decisions all the time as we work on a system.  Each decision, from those more-complex-than-it-needs-to-be features to irreversible design decisions to that single line of code, all have an impact on our productivity and agility.  Everybody talks about keeping it simple and taking a lean approach – this is nothing new.  It is how we can achieve this when making those decisions on the ground that is the focus of this blog.

There are at least five major areas where we love to introduce complexity: feature development, execution, design, technology selection and code.

Here I want to go through a number of approaches that I have used that help prevent complexity being incurred because, let’s face it, the solution should only be as complex as the problem at hand.

Unnecessary feature complexity

The basic tenet of unnecessary feature complexity, especially when dealing with new product development, is to answer the question “What is the minimum we have to do to figure out if this is going to work?” Again, nothing new.  This is part of the Lean movement.  I’m sure you have your own, but allow me to to offer the following approach to exploring a new product or service:

  • What is the minimum amount of work we need to do in order to have the initial conversation with the first client?
    • Rough idea on features available
    • Rough idea on tech, format, deployment and performance.  Discussion based, no documentation required.
  • What is the minimum amount of work we need to do to sell to our first client?
    • Minimum Marketable Product (MMP) determined along with feasibility
    • More information about data required to be supplied and what would be returned.
    • NOT the API contract
    • (Internal only) Rough idea of implementation time to version 0.1 for first client.
  • Once we have sold with delivery estimate to client:
    • Lock down API contract or feature set including success/failure scenarios – deliver early so client can start mocking out from their end and write the integration code
    • Integration approach detailed
    • Deployment architecture detailed
    • Performance characteristics and design to support detailed
    • Implement and deliver MMP
    • Roadmap for any further features created based on initial usage (iterative and incremental)

Pretty straight forward, huh?  Yet we all know when we have done too much design after a product or idea has been put on the backburner.  Ever thought “We really didn’t need to do that much work to make that decision to defer the product”?  Make the hard decisions now to save time later.

Of course, there are times when the above steps do not make sense, or they need to be modified.  For example, when doing a legacy migration. There are likely many clients that rely on features already present in the legacy system.  In these cases, we can take the opportunity to evolve the clients during the migration and do some spring cleaning of the legacy system, i.e. answer the questions “What features do we really need?” and “What features have proven to not be of value given what we know now about our business?”  Unfortunately, as is most often the case, there are “hidden” organic features that have no documentation that can trip us up.

Unnecessary execution complexity

There are many in this category, so let’s just focus on a few:

  • Break big problems down into smaller pieces:  Be exact with the acceptance criteria.  This one is self explanatory.
  • Business Accepting a story only when it is deployed and verified in production: This is a technique that I’ve spoken about before. It encourages quality and promotes continuous deployment, a part of the Continuous Delivery strategy.
  • Keep to 1 week iterations if suitable: As long as there is a long term strategy, 1 week iterations give more opportunities to retrospect and force breaking down of stories into smaller component parts of business value while keeping the overall mission squarely in the centre of the picture.
  • Automate delivery:  Once we have small stories and iterative, incremental development, deploying manually, especially if we are practicing BA’ing a story once it is in production, takes a lot of effort. Make the investment in Continuous Delivery supported by automation, taking into account the aspects I have blogged about before.  Micro Services are also a viable option (note: Micro Services is a bad name, IMO, a service should just be small enough to fit in our headspace…but that is a topic for another blog).

Unnecessary design complexity

Keep it simple. We all know the acronym. Here are some ideas to help focus development efforts:

  • Limit the number of business strategies/initiatives an organisation takes on:  This could actually be a section all on its own.  I have frequently been in organisations where multiple business strategies are pursued concurrently.  There is nothing wrong with pursuing multiple business strategies, but when it exhausts the capacity of the organisation, then we end up with a lot of half done ideas that never gain the momentum they’re supposed to due to not enough attention being paid to them (market forces notwithstanding).  This then trickles down into design.  Design is hard.  We must make sure we design within our capacity so we build something that will work and work well enough to fulfill the most valueable strategic objectives that we carefully chose.
  • Keep designs incredibly straight forward wherever possible: Again, not a new concept, yet when on the ground how many times have you looked at a design and thought “that is waaaay too complex”?  More than once, maybe?  Not every edge case needs to be covered.  Work with the business to understand the acceptable levels of failure a system can have.  During design, when an edge case comes up, if it’s technical weigh up the likelihood of occurrence vs the effort to fix.  Don’t just dive in and fix it because it’s there.  If it’s a business edge case, work with the business to see if it needs to be resolved or whether a failure is acceptable.  Again, make hard decisions here.  Overall, taking a little longer to decide that a feature will not be implemented will save effort over a quick decision to do it then having the entire team take on that complexity.

Unnecessary technology complexity

Keep the learning curve as low as possible so that engineers have less to learn technologically and can focus on the solutions implemented to create the product or service.  Whatever we can do to have them focus more on the problems being solved rather than the different libraries we used to, say, create and access files can only be a good thing:

  • Organisationally it is important to limit the technologies being used so we can build upon the knowledge acquired by our fellow engineers.  If we have one NoSQL database, do we really need another?  For a technology selection where NoSQL is deemed to be a good fit, it is important to come up with a reason why we cannot use a NoSQL technology that is already in production within our organisation, along with all the libraries we have built around it.  Of course, if that technology is not a fit, we then must be able to articulate the distinct advantages of the technology we want to introduce.  Polyglot solutions are a good thing, applied judiciously.  Again, effort vs value.  In this case, effort to de-risk, implement and build libraries/tools around the new technology vs value to the business.
  • Within a service it is important to limit the libraries used.  The number of libraries a single service can end up using can be huge.  That huge effective pom.xml if you use Maven, that long list of gems.  Limit the libraries used for a service so that engineers will not just reach for the library they know when facing a problem, adding to the burgeoning list of libraries required for the project.

Unnecessary code/implementation complexity

Have you ever looked at a codebase and seen multiple ways of achieving the same thing?  For example, file handling, filtering specific objects out of lists etc.?  Yeah, me too.  It’s important that we do common operations consistently within a codebase (and across services…harder but still possible).  Agree within the team how this should be done (use the Guava libraries? Apache Commons? etc.) and stick to it.  If you are an engineer and unsure, look at other parts of the codebase.  Common methods of handling code results in less to learn and a decreased ramp up period for engineers new to it.

A note on back end infrastructure systems vs front end product development

Front end product development needs rapid delivery.  So we can’t always live in a world where everything is consistent.  In these scenarios, agree among the team members the libraries to use, approach etc. and forge ahead developing the new product.  In this case, technical debt will ensue, but this is good technical debt.  In fact, we could say it is “necessary” technical debt. Necessary because not adding a feature can be as key to being first to market as adding a feature.

If the product or feature is successful and is moved from proof to long lived, address the technical debt that has been incurred.  This typically happens when a product gets traction and a company is coming out of the startup phase. A great blog that articulates technical debt is this one by Henrik Kniberg.

Summary

I hope the above guidance helps.  Each of the bullet points could be a topic in itself, and I wanted this blog to be a collection of areas to consider.  If I was to sum this up in one sentence, it would be

“Take just a little more time to make the hard decisions.”

I am always looking for ways to make the development of software and, by extension, products easier, so I’d be interested to know what strategies you use to limit complexity in a system. Feel free to leave a comment, below.

Aaaaand as I completed this, that lovely chap Dan North just released this presentation. 🙂

Posted in Agile, Software Execution | Tagged , , , | 2 Comments

Integration of a high velocity streaming system with batch oriented, slower downstream systems

Background

Over the last two years, I have been involved in transforming a complex legacy processing system that gave rise to a unique solution that may be used in other contexts.   Before continuing, it is important to note that the pattern of processing data quickly and buffering for downstream consumption is nothing new. What is different is the way in which the problem was approached, the principles used and the technologies selected.

The data to be processed was supplied by up to 14,000 merchants by way of feed files. Each feed file contained one or more offers. The merchant would update one or more feed files and make them available for processing. At peak, a feed file could contain up to 3MM offers (though 12MM was seen on occasion). Further, there were approximately 160 – 180MM individual offers in the ecosystem at any one time, with up to 80MM offers being updated per day.

As offers were processed, the goals included enriching these offers to enable better search results and analytics and updating the search index with those offers so they would be available to the various web sites. The SLA for this end-to-end process was 1 hour. For smaller organisations that do not have the budget for an extremely large Hadoop MR cluster (or similar), another solution needs to be found.

To meet the desired SLA, a streaming system was conceived. This meant that each individual offer would be extracted from feed files and be sent down the pipeline for processing individually. Calculations showed that this ingestion part of the system would run at 30-35,000 offers/second. Many downstream systems, however, could not run at that speed of update of individual offers including, for example, Solr master index updates, various legacy systems and the Hadoop ecosystem. As a result, an architectural solution had to be conceived that allowed the fast ingestion and storage of this feed data, and the provision of that feed data in a manner that allowed downstream systems to dictate their own ingestion speed.

During the design of the system, some principles came to the fore:

  • Processing of data had to conform to the following principles in order to ensure data integrity during both normal operations, error states, re-processing or other abnormal behavior:
  • Immutable – immutable data would be easier to scale.
  • Idempotent – replaying operations would not cause any unexpected side effects.
  • Associative – if different versions of an offer were processed, the final outcome would be the same regardless of the order in which they were processed.
  • Commutative – see Associative, above.
  • There would most likely be two repositories required: one to store the data after fast ingestion and processing (the system of record), and one to hold a subset of the data for downstream systems to ingest (the staging repository): note, this also clearly demarcated the responsibilities, and therefore workloads, of each repository.
  • The services that interacted would need to be easily scalable and needed to have the following properties:
  • Asynch IO
  • Stateless
  • Metadata (e.g. state) moves with the data it applies to in the same packet to avoid querying multiple stores for results.
  • The system of record would have to perform at a minimum of 35k TPS and, the faster the system of record, the less complex the solution would be, and the easier it would be to implement as optimisations were less likely to be required (e.g. caching).
  • Slow clients always had to have access to the latest data, at the same time giving them the option to miss some updates to get the most recent version of a particular datum (in this case, an offer).

This blog entry aims to describe the architecture chosen at a high level while submitting the principles of the pattern that emerged.

How it works

StaggeredProcessing_HowItWorks

As depicted in the diagram above, data was pushed or pulled from internal (i.e. same organisation) or external (i.e. different organisation) systems from the untrusted domain into the pipeline system where it entered the trusted domain. Data sanity and validation occurred in Pipeline Service 1 prior to triggering streamed data processing. This continued with data being processed and enriched along the pipeline until it was stored in the System of Record and then passed to the Pipeline Data Publisher.

Downstream clients could be either new or legacy. These typicallywould not have the high-performing SLA requirements of the pipeline, due to it either not being necessary or the nature of the system (e.g. batch processed machine learning). As a result, a staging repository had to be put in place that allowedthe downstream clients to consume the data at their own rates. Note that this staging repository only containsa subset of data for multiple reasons including efficiency and cost. Finally, if the staging repository were to become corrupted for any reason, a mechanism to repopulate it from the System of Record had to be put in place.

Example

StaggeredProcessing_ExampleAs can be seen from the above diagram, the pipeline system designed was any set of services necessary for the task at hand.

The System of Record needed to be able to handle a very high transactional load (35k TPS) at low latency to ensure that any queues in the system would be drained fast enough. Having looked at other solutions, including Clustrix, MongoDB, Riak, dbshards and Oracle, VoltDB was selected to perform this role as:

  • It met our SLA with the ability to horizontally scale to meet increased demand;
  • Its high throughput negated additional complexities such as caching;
  • Its immediately consistent nature meant there was no need to deal with conflict resolution, programmatically or otherwise, which is a common feature of eventually consistent, NoSQL systems.

As a result, the Pipeline Data Persister chosen was a VoltDB client that connected to the VoltDB cluster to perform its duties. After successful persistence of the data, the persister collected all data required (joining across VoltDB tables where necessary) and sent a flat data structure to the Pipeline Data Publisher (PDP).

The Pipeline Data Publisher chosen was a Kafka Producer that sent the data to the Kafka store. The size of the disks allocated to the Kafka store had to be sufficient not only to store data given the rate at which it was published, but also for long enough that downstream clients could consume everything they required. Finally, the downstream clients were all Kafka clients. This allowed them to take advantage of the architecture and hard decisions that have been made with the Kafka system, the fundamental difference with traditional messaging systems being that the clients can dictate the rate at which they consume messages. Of note is Kafka Consumer 2. This consumer was also an HDFS client. In this way, data streamed into the pipeline system could be ingested at a much slower rate into HDFS where much longer running processes worked on that data (e.g. Mahout). Optionally, the results of the long running processes could be added alongside the pipeline-ingested data, forming a coherent view of the data in the ecosystem.

Of course, the above is only an example. You should select those technologies that are suitable for your problem at hand when implementing this pattern, be it conformance to organizational guidelines, expertise etc.

Posted in Architecture, Technology | Tagged , , , | 2 Comments

Agile Execution – presentation slides

In the spirit of sharing with the tech community, I have decided to make available some of the presentations I have given at local Java User Groups, companies etc.  The first here is on Agile Execution.  It sums up all the lessons learned over 16 years of software engineering where I have been involved in four major Agile transformations.  The slides depict the best way I know how to work and the areas to focus on.  The information therein is a collection of experiences and lessons from various sources including some fantastic software architects and practitioners, and some great reference books.  Please leave a comment if you found them useful.

Of course, we can all get better and I look forward to being able to update the slides with better ways of working in the near future.

AgileExecution_v2.0

Posted in Agile | Tagged | Leave a comment

Continuous Delivery – the missing piece of the puzzle

Over the last couple of years we have been reading a lot about Continuous Delivery (CD).  A fantastic concept that can really propel organisations forwards by enabling fast, incremental value add to products.  There are a number of different concepts here that I’d like to address as they all interlink but I have yet to have read any article that links them all together.  If there is one, please let me know in the comments section, below.

CD allows us to repeatedly and consistently execute the deployment and monitoring steps necessary to deliver new functionality to a system in production in an automated fashion.  A more thorough definition can be found on Wikipedia.  Generally speaking, CD is a concept and strategy that has had real world implications and benefits.  What is important to note is that, like all strategies, they can be executed well or badly.  So to give CD adoption the greatest chance of success, there are a number of prior important factors that must be in place.  Let’s call these the foundations, which are:

  • A comprehensive suite of automated acceptance and unit tests integrated with the build process for the service in question (try PiTest if you think you have good tests).
  • Services that do one thing, and one thing only.  The concept of Micro Service Architectures has been about for a while, some people even saying that 200 lines of code is the most any service should contain.  In my opinion, the guiding principle is a service that has groups of logically related functions that don’t break the Single Responsibility Principle and is not too large to conceptualise.  A great description is from 21 mins 41 secs in this presentation: Micro Services.
  • Small stories and not Business Accepting those stories until they are in production makes rollback easier.  This is particularly important when rolling out a CD implementation for the first time.
  • Prepare the team for CD so that continually deploying to production is second nature.

It is this last point, I think, that is the most important and will make or break CD adoption.  This is what I refer to as the missing piece of the puzzle.

In most organisations, deployments only happen several times per week.  With the underpinning tools of CD, we are expecting to get to tens, if not hundreds, of deployments per day.  Moving an engineering culture, both at a team and organisational level, to the mindset required for CD adoption can be challenging.  So what is the best way to do it?  At Shopzilla, we started with CD adoption in mind for the future, so we took steps to prepare the Inventory Engineering team for this expectation (note, depending on your organisation, you may already have some or all of these implemented):

  • Removal of all restrictions for deployment to production so that the team that built the code deploys to production.
  • Stories are not BA’d until they are in production – this has the knock on effect of ensuring that tests pass, are thorough and that the Engineer(s) that wrote the feature have a high degree of confidence that the feature will work in production: the team sense of ownership is strong.  Of course, all the other benefits of a good test suite come part and parcel.
  • Stories are kept small so that the life of a story is only a couple of days at most until it is deployed to production.  Of course, not all features are like this because there are some that are low risk (feature additions, modifications in a single service) to high risk (e.g. database schema change that may affect more than one service), so strategise accordingly.
  • The architecture supports CD (see this presentation by Ancestry.com as an example).
  • Engineers deploy several times per day manually, normally as soon as a story has passed Quality Assurance – automated or manual depending on whether the feature is low or high risk.
  • Product Owners get used to having software go live many times per day and work with the Engineers to break down features into deployable increments with feature toggles.

Doing the above over a period of time changes the expectation of a team, from Product Owners to Engineers, so that deploying to production several times per day becomes the norm.  When this stage is reached, there is a fantastic opportunity to underpin the process with some quality tools that will provide automation.

At Shopzilla where I am currently, we have reached the stage where we have changed the Engineering mindset so that CD is seen as a valuable adoption pattern.  Another blog post will follow recounting our experiences of this as we implement over the next few months.

So to conclude, tools do not enable CD adoption, it is the mindset of the team.  Work on this first and the rest will follow.

Posted in Agile, Architecture, Continuous Delivery, Software Execution | Tagged , , , | 3 Comments

Interview in Baseline Mag about Shopzilla’s use of VoltDB

I was interviewed recently and the article just got published:

http://www.baselinemag.com/analytics-big-data/shopzilla-is-sold-on-big-data.html

Posted in Press | Tagged , , , , | Leave a comment

The Pragmatic Manifesto

I really like the Agile Manifesto. Although old, it has some basic tenets that organisations still fail to follow. Interesting to see the Reactive Manifesto too.

I’d like to propose The Pragmatic Manifesto. Normally software development is about making the right decisions to, among other aspects, increase revenue, increase user satisfaction or maintain velocity (through sensible and timely code refactoring etc.). I’m sure I’ll add to this but let’s start with the following:

Iterative and incremental dark launch over expensive pre-emptive performance testing

Features that are immediately required over features that may be required

Just enough design up front over diving straight into implementation

Fact based product enhancement over theory

Posted in Agile | Tagged , , | 2 Comments