I was recently invited to take part in a podcast where we discuss passion and purpose at the organisational, team and individual level. It was great fun and I invite you to listen and share your thoughts in the comments.
Taking part were:
I was recently invited to take part in a podcast where we discuss passion and purpose at the organisational, team and individual level. It was great fun and I invite you to listen and share your thoughts in the comments.
Taking part were:
In the Agile world we have all heard about how we build in quality up front with each feature developed, how we prioritise fewer features to greater quality rather than building on shaky foundations, the definition of done and a whole host of other mantras by which to ensure we build the right software.
What I want to talk about here is what defines a good user experience and how the concept of quality feeds right into this. By user, I am referring to any human interaction with a software product, although a lot of the principles apply to clients of APIs (see Developer Experience (DX, link to Ismail’s blog)).
First, let’s define what a good user experience is. It is divided into two categories:
The example we will use is that in stock and options trading where a Market Maker builds an application that allows a Trader to trade futures, stocks and derivates such as options. This is a good example because not only do Traders need to be able to navigate the application easily to quickly enter orders, but they must also have up to the second prices so they can make decisions based on fresh data regardless on the level of activity in the markets. Indeed, delayed prices make the application unusable and the Trader will probably leave that Market Maker and go to another one.
The user and their desired outcome
I won’t spend much time on this aspect as it has been covered in literature elsewhere many times before. Suffice to say that it is primarily driven through good Human Computer Interaction principles applied to the UI, the way the information is laid out specific to the user’s locale and way of working, clarity of information displayed and ease of knowing what to do next to name but a few. In our case study, the Market Maker builds and makes available an application that needs to make it easy for a Trader to model their trade, preview their order, submit the order, receive confirmation and monitor their positions as prices change
How well the application responds
This aspect is one that has as much, if not greater, importance to a user’s experience. What we are talking about here is how well the user can achieve their desired outcome while the system experiences a high level of stress, unexpected failures in sub-systems etc. As a fantastic example, think about the classic market open problem faced by all stock exchanges around the world. For nearly all types of trades, particularly intraday trading and scalping, traders expect a sub second response to their actions regardless of how many orders are going through their application and the exchanges at any one time. It could be a relatively quite period like a bank holiday when a trader is engaged in out of hours trading with very low volume at one extreme, all the way up to a catastrophic event that results in needing to liquidate positions immediately (think the 2008-2009 market crash). Traders pay a market maker commissions for each trade so the Market Maker makes money no matter what the outcome of the trade. The more trades the Market Maker can get through, the more money they make.
Putting it all together
What we have discussed so far is nothing new, however, the key aspect when developing software is having the entire development team and management understand what it takes to develop a feature. Just because a feature is demonstrable via a UI does not mean it is done. Far from it, this is just the tip of the iceberg. The majority of the work remains to make sure that the feature remains as usable with one user as it does with a million (or whatever the SLA is) and functions as expected in the wake of unexpected failures – i.e. is responsive and resilient – as well as other aspects such as security, availability etc. To achieve this, there are many software engineering principles and architectural techniques (a great one is the Reactive Manifesto) and it is the responsibility of every Product Development team to ensure that when Feature X is Done, that it means it:
Most of the work for a Product Development team is in the second category of how an application responds under different scenarios. Get this right and additional features will be able to leverage that investment, create a much better product for users and bring in more revenue even when there is a huge surge in activity. When an application remains responsive and informative, users achieve what they want resulting in higher customer satisfaction, and the company makes more money. A win-win!
Note: In this blog, the term “technology” is interchangeable with “framework” as the same principles apply.
I am currently working at Ticketmaster helping in a large scale Agile transformation and Platform migration. About seven or eight months ago we started looking at new and improved ways of creating services instead of the tried and tested Spring framework based approaches common in the organisation. The Spring framework certainly has its merits and we have used it with much success, however, the service we were about to write, called Gateway, was to route millions of requests to other services further down in our architecture layers:
We knew that the service had to run efficiently (thereby saving on hardware) and scale effectively. The event driven, reactive approach was one that we were looking to embrace. After much research we had some concerns. Not only is that space full of different frameworks at different levels of maturity, but we also had to consider our current skill set, which is predominantly Java with a small amount of Scala, Ruby, Clojure (which our data science guys use) and a handful of other languages we’ve picked up through company acquisitions. How could we adopt this new paradigm in the easiest possible way?
What this blog post will detail is the approach used to select the chosen framework. It’s a tried and tested approach I’ve used before before and will continue to use and improve upon in the future.
Before we describe the stages of what we did, suffice to say that there is no point in doing a technology selection without a business context and an idea of the service(s) that technology will be used to build.
The technology selection was broken up into the following steps:
Our results are pretty detailed so have been added into a separate PDF that you can download here. We have left this in raw format and hope that they will be a good reference for others.
As you can see from the PDF of results, we chose Vertx. It won out not only because of its raw power, but because of it’s fantastic architecture, implementation, ease of use, Google Groups support and the fact that Red Hat employs a small team to develop and maintain it. Indeed, a few weeks after we selected it, it was announced that Red Hat hired two more engineers to work on Vertx.
So overall we have been very happy with our selection of Vertx. We had version 2.1.5 running in production for several months and recently upgrade to Vertx 3. The maintainers’ swift response on the Vertx Google Group definitely helped during our initial development phase and during the upgrade to version 3. Performance wise, the framework is extremely fast and we know that any slow down is most likely due to what we have implemented. Adoption has been a success. From a team of two developers, we scaled to four and now eight. It is also starting to be used for other services. Choosing a Java based framework has been a boon as the only additional complexity that needed to be learned by the developers joining the team was the event driven nature of Vertx (i.e. the framework itself). Had we chosen Scala/Play it would have been much harder. Indeed, with the success of Vertx, our decision to standardise on the JVM as a platform and our embracing of the reactive approach, we have a couple of services being built using Scala and at least one using Scala/Play. It would be great to hear of your experiences using reactive frameworks. Which ones did you choose? How easy were they to adopt? Please leave a comment, below.
Note: This is a slightly modified cross post of a blog that originally appeared here.
Over the last 16 years in the software industry, I have been involved in a number of Agile cultural transformations. It was at lunch the other day during a conversation that I articulated an aspect of those transformations that is as true today as it was at the first transformation I was involved in.
During any cultural change, and here I am specifically talking about Agile cultural transformation although this may apply to other types of organisational change, there must be Leadership and Inspiration. These go hand in hand and must come down from senior executives. This is nothing new. However, once the presentation has been given, once the direction has been set, how do we maintain momentum among the many teams that are involved in making the transformation a success? The importance of this is directly proportional to the size of the organisation. The more teams, the more people involved, the harder it is to maintain momentum.
So how exactly do we maintain momentum? There are many options out there so let me present you with the two that I believe are most important.
I want to mention this here for completeness. There are many external sources that address this, so I will not here. Let’s just note that this is the starting point for any successful transformation. It is not a set and forget, rather a continued re-visit of the original strategy laid out, noting any modifications along the way, impact on the business and, most of all, impact on the people involved.
Here is where I think there is a gap in the industry’s thinking. Twelve years ago when I took my first technical lead position during an Agile transformation, it was quite clear that there was a gap in the market. That person that could take the Agile transformation directive and translate it to on the ground, practical application. This is still true today. Organisational transformation literature calls these people Champions of Change. In an Agile transformation, these Champions of Change need both the soft people skills and the hard technical skills required to both translate the transformation vision from strategy to practice and to earn the respect of their peers. These key people are Technical Agile Mentors (TAMs). Here are some examples of where I see these TAMs having great leverage during the day to day execution of the transformation:
So it is clear that for a successful Agile transformation to have the best possible chance of success not only do we need strong leadership at the executive level, but strong leadership at the team level. This is the gap that is filled by the TAM. This team level leadership in the context of an Agile transformation that a TAM brings requires a unique blend of people and technical skills. Further, they must be current in both aspects of those skillsets as well as being visionaries for that team and across teams. Given the nature of engineers and the rapid advancements in technologies, these people are difficult to find. But take the time to choose them wisely because their impact can be profound.
“Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius — and a lot of courage — to move in the opposite direction.” – Einstein
Software development is hard. That’s a fact. Not only do we have a product that we must enhance, develop from scratch, or even maintain, but we have people’s interactions, opinions, emotions, the list goes on. And then there is design, technology and execution. All these factors, all these variables make software development the fantastic challenge that it is. And I love it. At the same time, I am on a mission to remove complexity introduced into a system when it need not be there. The actual doing of this, making the decisions about what to include, what to leave out and to do this consistently on a day to day basis is difficult.
As Product Owners and Engineers, we are faced with these decisions all the time as we work on a system. Each decision, from those more-complex-than-it-needs-to-be features to irreversible design decisions to that single line of code, all have an impact on our productivity and agility. Everybody talks about keeping it simple and taking a lean approach – this is nothing new. It is how we can achieve this when making those decisions on the ground that is the focus of this blog.
There are at least five major areas where we love to introduce complexity: feature development, execution, design, technology selection and code.
Here I want to go through a number of approaches that I have used that help prevent complexity being incurred because, let’s face it, the solution should only be as complex as the problem at hand.
Unnecessary feature complexity
The basic tenet of unnecessary feature complexity, especially when dealing with new product development, is to answer the question “What is the minimum we have to do to figure out if this is going to work?” Again, nothing new. This is part of the Lean movement. I’m sure you have your own, but allow me to to offer the following approach to exploring a new product or service:
Pretty straight forward, huh? Yet we all know when we have done too much design after a product or idea has been put on the backburner. Ever thought “We really didn’t need to do that much work to make that decision to defer the product”? Make the hard decisions now to save time later.
Of course, there are times when the above steps do not make sense, or they need to be modified. For example, when doing a legacy migration. There are likely many clients that rely on features already present in the legacy system. In these cases, we can take the opportunity to evolve the clients during the migration and do some spring cleaning of the legacy system, i.e. answer the questions “What features do we really need?” and “What features have proven to not be of value given what we know now about our business?” Unfortunately, as is most often the case, there are “hidden” organic features that have no documentation that can trip us up.
Unnecessary execution complexity
There are many in this category, so let’s just focus on a few:
Unnecessary design complexity
Keep it simple. We all know the acronym. Here are some ideas to help focus development efforts:
Unnecessary technology complexity
Keep the learning curve as low as possible so that engineers have less to learn technologically and can focus on the solutions implemented to create the product or service. Whatever we can do to have them focus more on the problems being solved rather than the different libraries we used to, say, create and access files can only be a good thing:
Unnecessary code/implementation complexity
Have you ever looked at a codebase and seen multiple ways of achieving the same thing? For example, file handling, filtering specific objects out of lists etc.? Yeah, me too. It’s important that we do common operations consistently within a codebase (and across services…harder but still possible). Agree within the team how this should be done (use the Guava libraries? Apache Commons? etc.) and stick to it. If you are an engineer and unsure, look at other parts of the codebase. Common methods of handling code results in less to learn and a decreased ramp up period for engineers new to it.
A note on back end infrastructure systems vs front end product development
Front end product development needs rapid delivery. So we can’t always live in a world where everything is consistent. In these scenarios, agree among the team members the libraries to use, approach etc. and forge ahead developing the new product. In this case, technical debt will ensue, but this is good technical debt. In fact, we could say it is “necessary” technical debt. Necessary because not adding a feature can be as key to being first to market as adding a feature.
If the product or feature is successful and is moved from proof to long lived, address the technical debt that has been incurred. This typically happens when a product gets traction and a company is coming out of the startup phase. A great blog that articulates technical debt is this one by Henrik Kniberg.
I hope the above guidance helps. Each of the bullet points could be a topic in itself, and I wanted this blog to be a collection of areas to consider. If I was to sum this up in one sentence, it would be
“Take just a little more time to make the hard decisions.”
I am always looking for ways to make the development of software and, by extension, products easier, so I’d be interested to know what strategies you use to limit complexity in a system. Feel free to leave a comment, below.
Aaaaand as I completed this, that lovely chap Dan North just released this presentation. 🙂
Over the last two years, I have been involved in transforming a complex legacy processing system that gave rise to a unique solution that may be used in other contexts. Before continuing, it is important to note that the pattern of processing data quickly and buffering for downstream consumption is nothing new. What is different is the way in which the problem was approached, the principles used and the technologies selected.
The data to be processed was supplied by up to 14,000 merchants by way of feed files. Each feed file contained one or more offers. The merchant would update one or more feed files and make them available for processing. At peak, a feed file could contain up to 3MM offers (though 12MM was seen on occasion). Further, there were approximately 160 – 180MM individual offers in the ecosystem at any one time, with up to 80MM offers being updated per day.
As offers were processed, the goals included enriching these offers to enable better search results and analytics and updating the search index with those offers so they would be available to the various web sites. The SLA for this end-to-end process was 1 hour. For smaller organisations that do not have the budget for an extremely large Hadoop MR cluster (or similar), another solution needs to be found.
To meet the desired SLA, a streaming system was conceived. This meant that each individual offer would be extracted from feed files and be sent down the pipeline for processing individually. Calculations showed that this ingestion part of the system would run at 30-35,000 offers/second. Many downstream systems, however, could not run at that speed of update of individual offers including, for example, Solr master index updates, various legacy systems and the Hadoop ecosystem. As a result, an architectural solution had to be conceived that allowed the fast ingestion and storage of this feed data, and the provision of that feed data in a manner that allowed downstream systems to dictate their own ingestion speed.
During the design of the system, some principles came to the fore:
- Immutable – immutable data would be easier to scale.
- Idempotent – replaying operations would not cause any unexpected side effects.
- Associative – if different versions of an offer were processed, the final outcome would be the same regardless of the order in which they were processed.
- Commutative – see Associative, above.
- Asynch IO
- Metadata (e.g. state) moves with the data it applies to in the same packet to avoid querying multiple stores for results.
This blog entry aims to describe the architecture chosen at a high level while submitting the principles of the pattern that emerged.
As depicted in the diagram above, data was pushed or pulled from internal (i.e. same organisation) or external (i.e. different organisation) systems from the untrusted domain into the pipeline system where it entered the trusted domain. Data sanity and validation occurred in Pipeline Service 1 prior to triggering streamed data processing. This continued with data being processed and enriched along the pipeline until it was stored in the System of Record and then passed to the Pipeline Data Publisher.
Downstream clients could be either new or legacy. These typicallywould not have the high-performing SLA requirements of the pipeline, due to it either not being necessary or the nature of the system (e.g. batch processed machine learning). As a result, a staging repository had to be put in place that allowedthe downstream clients to consume the data at their own rates. Note that this staging repository only containsa subset of data for multiple reasons including efficiency and cost. Finally, if the staging repository were to become corrupted for any reason, a mechanism to repopulate it from the System of Record had to be put in place.
The System of Record needed to be able to handle a very high transactional load (35k TPS) at low latency to ensure that any queues in the system would be drained fast enough. Having looked at other solutions, including Clustrix, MongoDB, Riak, dbshards and Oracle, VoltDB was selected to perform this role as:
As a result, the Pipeline Data Persister chosen was a VoltDB client that connected to the VoltDB cluster to perform its duties. After successful persistence of the data, the persister collected all data required (joining across VoltDB tables where necessary) and sent a flat data structure to the Pipeline Data Publisher (PDP).
The Pipeline Data Publisher chosen was a Kafka Producer that sent the data to the Kafka store. The size of the disks allocated to the Kafka store had to be sufficient not only to store data given the rate at which it was published, but also for long enough that downstream clients could consume everything they required. Finally, the downstream clients were all Kafka clients. This allowed them to take advantage of the architecture and hard decisions that have been made with the Kafka system, the fundamental difference with traditional messaging systems being that the clients can dictate the rate at which they consume messages. Of note is Kafka Consumer 2. This consumer was also an HDFS client. In this way, data streamed into the pipeline system could be ingested at a much slower rate into HDFS where much longer running processes worked on that data (e.g. Mahout). Optionally, the results of the long running processes could be added alongside the pipeline-ingested data, forming a coherent view of the data in the ecosystem.
Of course, the above is only an example. You should select those technologies that are suitable for your problem at hand when implementing this pattern, be it conformance to organizational guidelines, expertise etc.
In the spirit of sharing with the tech community, I have decided to make available some of the presentations I have given at local Java User Groups, companies etc. The first here is on Agile Execution. It sums up all the lessons learned over 16 years of software engineering where I have been involved in four major Agile transformations. The slides depict the best way I know how to work and the areas to focus on. The information therein is a collection of experiences and lessons from various sources including some fantastic software architects and practitioners, and some great reference books. Please leave a comment if you found them useful.
Of course, we can all get better and I look forward to being able to update the slides with better ways of working in the near future.
Over the last couple of years we have been reading a lot about Continuous Delivery (CD). A fantastic concept that can really propel organisations forwards by enabling fast, incremental value add to products. There are a number of different concepts here that I’d like to address as they all interlink but I have yet to have read any article that links them all together. If there is one, please let me know in the comments section, below.
CD allows us to repeatedly and consistently execute the deployment and monitoring steps necessary to deliver new functionality to a system in production in an automated fashion. A more thorough definition can be found on Wikipedia. Generally speaking, CD is a concept and strategy that has had real world implications and benefits. What is important to note is that, like all strategies, they can be executed well or badly. So to give CD adoption the greatest chance of success, there are a number of prior important factors that must be in place. Let’s call these the foundations, which are:
It is this last point, I think, that is the most important and will make or break CD adoption. This is what I refer to as the missing piece of the puzzle.
In most organisations, deployments only happen several times per week. With the underpinning tools of CD, we are expecting to get to tens, if not hundreds, of deployments per day. Moving an engineering culture, both at a team and organisational level, to the mindset required for CD adoption can be challenging. So what is the best way to do it? At Shopzilla, we started with CD adoption in mind for the future, so we took steps to prepare the Inventory Engineering team for this expectation (note, depending on your organisation, you may already have some or all of these implemented):
Doing the above over a period of time changes the expectation of a team, from Product Owners to Engineers, so that deploying to production several times per day becomes the norm. When this stage is reached, there is a fantastic opportunity to underpin the process with some quality tools that will provide automation.
At Shopzilla where I am currently, we have reached the stage where we have changed the Engineering mindset so that CD is seen as a valuable adoption pattern. Another blog post will follow recounting our experiences of this as we implement over the next few months.
So to conclude, tools do not enable CD adoption, it is the mindset of the team. Work on this first and the rest will follow.
I was interviewed recently and the article just got published:
I’d like to propose The Pragmatic Manifesto. Normally software development is about making the right decisions to, among other aspects, increase revenue, increase user satisfaction or maintain velocity (through sensible and timely code refactoring etc.). I’m sure I’ll add to this but let’s start with the following:
Iterative and incremental dark launch over expensive pre-emptive performance testing
Features that are immediately required over features that may be required
Just enough design up front over diving straight into implementation
Fact based product enhancement over theory