CategoryProgramming

Serial vs Parallel task execution

This time let’s talk a bit about the difference between serial and parallel task execution.

The idea is simple: if we have two or more operations depending one from another (eg. the result of one goes as input into another), then we need to run them in serial, one after the other.

Total execution time will be the sum of the time taken by the single steps. Plain and easy.

What if instead the operations don’t interact? Can they be executed each in its own path so we can collect the results later on? Of course! That is called parallel execution .

parallel car racing track

It’s like those electric racing tracks: each car gets its own lane, they can’t hit/interfere each other and the race is over when every car completes the circuit.

So how can we do that? Luckily for us, in .NET we can use Task.WhenAll() or Task.WaitAll() to run a bunch of tasks in parallel.

Both the methods do more or less the same, the main difference is that Task.WaitAll waits for all of the provided Task objects to complete execution, blocking the current thread until everything has completed.

Task.WhenAll instead returns a Task that can be awaited on its own. The calling method will continue when the execution is complete but you won’t have a thread hanging around waiting.

So in the end, the total time will be more or less (milliseconds heh) the same as the most expensive operation in the set.

I’ve prepared a small repository on Github to demonstrate the concepts, feel free to take a look. It’s a very simple .NET Core console application showing how to execute two operations in serial and then in parallel.

Here’s a screenshot I got on my Macbook Pro:

Know your data structures – List vs Dictionary vs HashSet

Are there any cases when it doesn’t really matter how your data is structured, as long as you’re fulfilling the task at hand? Or is it always important to use the perfect data structure for the job? Let’s find out!

Those collections have quite different purposes and use cases. Specifically, Lists should be used when all you have to do is stuff like enumerating the items or accessing them randomly via index.

Lists are very similar to plain arrays. Essentially they are an array of items that grow once its current capacity is exceeded. It’s the standard and probably the most used collection. Items can be accessed randomly via the [] operator at constant time. Adding or removing at the end costs O(1) as well, except when capacity is exceeded. Doing it in the beginning or the middle requires all the items to be shifted.

Dictionaries and HashSets instead are specialised collections intended for fast-lookup scenarios. They basically map the item with a key built using an hash function. That key can be later on used to quickly retrieve the associated item.

They both share more or less the same asymptotic complexity for all the operations. The real difference is the fact that with a Dictionary we can create key-value pairs (with the keys being unique), while with an HashSet we’re storing an unordered set of unique items.

It’s also extremely important to note that when using HashSets, items have to properly implement GetHashCode() and Equals() .


On Dictionaries instead that is obviously needed for the Type used as key.

I wrote a very small profiling application to check lookup times of List, Dictionary and Hashset. Let’s do a quick recap of what these collections are. It first generates an array of Guids and uses it as source dataset while running the tests.

The code is written in C# using .NET Core 2.2 and was executed on a Macbook Pro mid 2012. Here’s is what I’ve got:

Collection creation
Collection creation

Lists here perform definitely better, likely because Dictionaries and HashSets have to pay the cost of creating the hash used as key for every item added.

Collection creation and lookup
Collection creation and lookup

Here things start to get interesting: the first case shows the performance of creation and a single lookup. More or less the same stats as simple creation. In the second case instead lookup is performed 1000 times, leading to a net win of Dictionary and HashSets. This is obviously due to the fact that a lookup on a List takes linear time ( O(n) ), being constant instead ( O(1) ) for the other two data structures.

Lookup of a single item
Lookup of a single item

In this case Dictionaries and HashSet win in both executions, due to the fact that the collections have been populated previously.

Lookup in a Where()
Lookup in a Where()

For the last example the system is looping over an existing dataset and performing a lookup for the current item. As expected, Dictionaries and HashSet perform definitely better than List.

It’s easy to see that in almost all the cases makes no difference which data structure is used if the dataset is relatively small, less than 10000 items. The only case where the choice matters is when we have the need to cross two collections and do a search.

Using Decorators to handle cross-cutting concerns — Part 2 : a practical example

In my previous article I discussed a bit about how to use the Decorator pattern to implement cross-cutting concerns and reduce clutter in your codebase.

Today it’s going to be a bit more practical: we’ll be looking at a small demo I published on Github that makes use of Decorators as well as some other interesting things like .NET Attributes, CQRS and Dependency Injection.

I’m not going to deep dive into the details of CQRS as it would obviously take too much time and it’s outside the scope of this article. I’m using it here because query/command handlers usually expose just one method so there is no need to implement a big interface. Also, I like the pattern a lot 🙂

So let’s go straight to the code! The repository is available here: https://github.com/mizrael/cross-cutting-concern-attributes

It’s a very small .NET Core WebAPI application, nothing particularly fancy. No infrastructure of course, there’s no need for this article.

There’s just one API controller, exposing a single GET endpoint to retrieve a list of “values”. I might have called it “stuff” instead of “values”, it’s just an excuse to retrieve some data from the backend.

As you may have noticed, there’s no direct reference to the query handler in the API controller: I prefer to use MediatR to avoid injecting too many things in the constructor. It has become an habit so I’m doing it even when there’s just one dependency.

For those who don’t know it, MediatR acts as a simple in-process message bus, allowing quick dispatch of commands, queries and events. So, basically, it’s a very handy tool when implementing CQRS.

The ValuesArchiveHandler class handles the actual execution of the query. Actually it’s not doing much, apart from returning a fixed list of strings.

What we’re interested into actually is that small attribute, [Instrumentation] . It is just a marker, the real grunt-work will be elsewhere. I could have used an interface as well of course, but there are several reasons why I didn’t.

First of all, I prefer to avoid empty interfaces: an interface is a contract, and an interface without method doesn’t define any contract.

Moreover, attributes can always be configured to not propagate to descendant types automatically, something you cannot do with interfaces.

Now, take a look at the InstrumentationQueryHandlerDecorator class. It’s a query handler Decorator, so it gets an instance of a query handler injected in the constructor, and uses it in the Handle() method.

This decorator is not doing anything particular fancy, it’s just using Stopwatch to track how much time the inner handler is taking to complete.

What we’re interested into is the constructor: there the system is checking if the inner instance has been marked with the [Instrumentation] attribute, flipping a boolean value based on the result. That bool will then be used in the Handle() method to turn the instrumentation on or off. That’s it!

I’m using StructureMap as my IoC container and I’m taking care of the handler registration here . In the same file I also decorate all the query handlers with the InstrumentationQueryHandlerDecorator .

Keep in mind that I could have added some smarts here and check at registration time if a particular handler had been decorated with the [Instrumentation] attribute.

That would probably be a better solution as it would avoid runtime type checks, handling everything during the application bootstrap.

I’ll probably add this to the repository, I left it out to keep things simple 🙂

This article is also available on Medium as part of a series:

Using Decorators to handle cross-cutting concerns

I was actually planning of posting this article here but I was migrating to another server the last week and it took one week for the domain to point to the new DNS. Turns out this gave me the chance to try Medium instead, so published my first article there.

This time I’ll be writing about a very simple but powerful technique to reduce boiler-plate caused by cross-cutting concerns. In this post we’ll explore a simple way to encapsulate them in reusable components using the Decorator pattern.

Let’s first talk a bit about “cross cutting concerns”. On Wikipedia we can find this definition:

Cross-cutting concerns are parts of a program that rely on or must affect many other parts of the system.

In a nutshell, they represent almost everything not completely tied to the domain of the application but that can affect in some way the behaviour of its components.

Examples can be:
– caching
– error handling
– logging
– instrumentation

Instrumentation for instance can lead to a lot of boilerplate code which eventually will create clutter and pollute your codebase. You’ll basically end up with a lot of code like this:

Of course, being IT professionals, you can quickly come up with a decent solution, find the common denominator, extract the functionality, refactor and so on.

So…how would you do it? One option would be to use the Decorator pattern! It’s a very common pattern and quite easy to understand:

Basically you have a Foo class that you need somewhere that implements a well known interface, and you need to wrap it into some cross-cutting concern. All you have to do is:

  1. create a new container class implementing the same interface
  2. inject the “real” instance
  3. write your new logic where you need
  4. call the method on the inner instance
  5. sit back and enjoy!

Very handy. Of course it can be quite awkward in case your interface has a lot of methods, but in that case you might have to reconsider your architecture as it is probably breaking SRP.

One option would be moving to CQSCQRS. In the next post of the series we will see a practical example and discuss why those patterns can be an even more interesting option when combined with Decorators.

Stay tuned!

The importance of setting the boundaries (of your domain models)

First article of the year! I really wanted to start writing this few weeks ago, but honestly I wasn’t inspired enough.
Now that I’ve spent a good portion of the Christmas break reading blogs, books and watching courses on Pluralsight, I still don’t feel inspired enough.

I guess it’s due to how I spent the other portion (eating, sleeping and playing with my kids, mostly), which left me basically without any energy at all.

But as they say, the first step is always the hardest, no?

Lately I’ve been putting some effort into improving my DDD techniques. For those of you who are still living under a rock, please consider taking a copy of the marvelous blue book by Eric Evans.

I still am in the middle of the learning process, even though I probably started this path years ago. Might be a sign of impostorism. Or it might be the fact that in this job, as in many other jobs, you never stop learning.

Not going to discuss about DDD now, or what the benefits are. I shall leave this for another post.

This is going to be just a very quick introduction about boundaries and aggregates instead. What’s an aggregate? An example should make things easier.

We can consider an Order our aggregate and Order Lines compose its internal state.

I said “internal” for a reason: the Order is the “entry-point”. We can’t have Order Lines without an Order that contains them and obviously an Order without Order Lines is pretty useless.

At the same time, you can’t access Order Lines from the outside world without going through the Order.

Clear? Definitely not rocket science.

Why “aggregate” ? Well, because you’re combining things together and building up a structure that mimics your current domain. The Order and the Order Lines are entities and value objects that will eventually form the Aggregate.

I’ll talk more about the distinction between Entities and Value Objects in another post, for now just take that for granted (or lookup on Google!).

For now just think that eventually your system will be a composition of multiple Aggregates acting and interacting together, and the quality of their interaction will be a representation of how good you know your Domain. Communication is the key here!

At this point I would say that the trick is to find the right balance and being able to identify the right boundaries for your Aggregates. Why? Simply because of divide-et-impera.

Our domain might be complex from the beginning, or become extremely complex over time. Everybody has seen this happening at some point. So take a deep breath, talk to your Domain Expert ™ and identify what are the edges and the sets of your entities. That’s it. Compartmentalize.

There’s a lot more to write on this, and I definitely will. When? That’s definitely a good question! A lot is going on in my life these days and weeks and I have the sensation that 2019 is going to be a crucial year for everyone here.

Ciao!

© 2019 Davide Guida

Theme by Anders NorenUp ↑