Photo Friday: Acadia National Park, Maine, USA

It’s photo Friday again.  This time I will share some very recent pictures which I took during the last weekend.  Nadja and I decided to do a trip to the Acadia National Park in Maine for some hiking.

I was in Acadia the last time exactly 4 years ago, shortly before I moved to the US.  Last time, we went up there to convince ourselves that this move was a good idea.  And we indeed followed through with this short after.  This time, we came back as regular travelers.  And to enjoy the great hikes and all other things Acadia offers.

It might not be the biggest or most impressive of the National Parks.  But they allow our furry friend Marla on the trails.  And it is a special place to us which we can reach in a nice 5-hour drive.

Here are the images I have selected for today:

Which is the Right Open Source Business Model for You?

This is the third and last part of a series.

Summary: When should you consider to use an open source license

Here is the essence of the two previous posts:

  • Open source licenses support the creation of communities and accelerate innovation.  They do this by allowing people to change or even embed your intellectual property,
  • Your go-to-market strategy should match the decision for your license.  If communities and innovation are vital, open source is a good option.  If only fast market penetration is important, a freemium model might be better.  If none of this matters, traditional enterprise sales might be the better way to go.
  • Open-source-based strategies can work very well when you plan to bring a new business model into an established market.  Good examples for this are MySQL or Red Hat.  Here a “good enough”-product-strategy with a lower TCO can win the race.
  • Open-source-based strategies also lean themselves to platform products in a complex ecosystem.  A good example here is the Hadoop / Spark landscape.  A fast land grab is of benefit here as well.
  • Finally, open-source based strategies also work better for very developer-centric fields (like Hadoop again, or Atlassian etc.).  Open APIs, large communities, and marketplaces are winning strategies here.  But Atlassian (see below) is also a good example where a hybrid model can work very well.

Open-Source-Based Business Models

Let me start this blog post by saying that I am a big believer in open-source-models.  If the go-to-market strategy calls for it that is. Otherwise, there are likely better options for creating a business around your software.

The last question now is how to turn the decisions you made into a working business model. For this purpose, I won’t dive into the fine differences between “free software” and “open source.” But I would like to discuss the most important business models around open-source-software. The table below shows a quick overview over the most successful approaches:

Business Model What is open? What do you sell? Pros Cons
Open Source (e.g. Red Hat) All your software Services only, like training, support, and guarantees Supports the original ideas of open source in the strongest way. Very hard to create a scalable business. Most companies fail.
Business Source (e.g. Jedox) Older software versions Latest version of the software, support, and services Ultimately everyone gets access to everything Majority of users does not benefit from innovation, maintenance of multiple versions.
Open Core (e.g. MySQL, Talend, Pentaho) The core of the software Additional software features, support, and services Clear feature-based differentiation, a good balance between open source concepts and commercialization. Some features will never be available to the general community.

A pure open source model has seldom been a successful commercial business model. Maybe the only successful example is Red Hat. Not a surprise since it’s an operating system and if it breaks down, everything running on the machine breaks. So, selling support and guarantees can be enough of a value proposition in this case. But for business applications this is generally not a sustainable business model.  In general you can say that the only thing you can really sell is services and guarantees – which also does not scale as well as selling software.

The other two models, open core and business source, try to find a balance between community benefits and commercial success for the vendor. Let’s keep in mind that most vendors have developers who need to feed their families, too.

The idea behind an open core model is that you are not giving away all features for free, but enough features to build a meaningful community. The paid version of your product comes with more features which are valued enough by users to pay for them.

Business source is the least known of the three models. The idea here is that only paying customers are getting the latest version with all new features. Older versions, or all source code after a time delay of let’s say 3 years, turn into a standard open source license.

I have to admit that I liked the business source model for quite some time. But it has significant disadvantages: your community is no longer getting access to the latest versions. And it cannot contribute to the product any longer. So, you are losing one of the biggest advantages of open source licenses: a faster innovation. This time delay is also problematic from a quality assurance perspective. Your paying customers are now getting the version which is least tested.

Adjacent Business Models: Freemium & The Atlassian Model

Finally, I would like to discuss two adjacent models. They are not based on open-source licenses, but share many of their characteristics. Those are the freemium model and the Atlassian model. A freemium model offers a limited version of your product for free and makes you pay for the whole thing. That sounds very much like the open core model discussed above. If building a developer community and higher levels of innovation are less important for you, a freemium model is often the better choice.

The Atlassian model was introduced by the very successful software company of the same name. It has two interesting twists: the first is that using the software with only a few users is VERY cheap (but not free!). But the pricing curve is rather steep if you add users beyond a certain threshold. This has similar dynamics to freemium models and open core. And it can work very well if the value of the software scales with the number of users. The second twist is to make your APIs very well documented and developer friendly. This allows developers to hook into the product. While this is not exactly an open-source approach, it can still create a lot of innovation and a massive developer community.

I hope this series of blog posts is helpful to figure out if an open source license is the right thing for you or not.  And then ultimately how to match your go-to-market strategy and business model to it.

Photo Friday: Norway

It’s photo Friday.  And that means that it is time for another set of pictures I took in the past.  Today I will share some images from Norway.

I went to Norway twice.  The first time I was still in school and traveled across Norway for 6 weeks on my bicycle.  Yes, this is correct.  I was driving 2,500 miles (about 4,000 km) on my bicycle through those mountains.  This is as crazy as it sounds.  But on the upside: this was the first big trip with Nadja who I married later.  If you survive a trip like that when you are 15 years old, there is nothing you cannot survive as a couple later in life.

We went to Norway again 6 years ago, which was about 20 years after our first trip.  This time we drove a car and visited some of the places we have been to before – and many more.  This country is so beautiful.  We will be back, probably in 20 years to keep the pattern.

Here are the images I have selected for today:

What Artificial Intelligence and Machine Learning can do – and what not

I have written on Artificial Intelligence (AI) before.  Back then I focused on the technology side of it: what is part of an AI system and what isn’t.  But there is another question which might be even more important.  What are we DOING with AI?

Part of my job is to help investors with their due diligence.  I discuss companies with them in which they might want to invest. Here is a quick observation:  By now, every company pitch is full with stuff about how they are using AI to solve a given business problem.

Part of me loves this since some of those companies are on something and should get the chance.  But I also have a built-in “bullshit-meter”.  So, another part of me wants to cringe every time I listen to a founder making stuff up about how AI will help him.  I listened to many founders who do not know a lot about AI, but they sense that they can get millions of dollars of funding.  Just by adding those fluffy keywords to their pitch.  The bad news is that it sooner or later actually works.  Who am I to blame them?

I have seen situations where AI or at least machine learning (ML) has an incredible impact.  But I also have seen situations where this is not the case.  What was the difference?

In most of the cases where organizations fail with AI or ML, they used those techniques in the wrong context.  ML models are not very helpful if you have only one big decision you need to make.  Analytics still can help you in such cases by giving you easier access to the data you need to make this decision.  Or by presenting this data in a consumable fashion.  But at the end of the day, those single big decisions are often very strategic.  Building a machine learning model or an AI to help you making this decision is not worth doing it.  And often they also do not yield better results than just making the decision on your own.

Here is where ML and AI can help. Machine Learning and Artificial Intelligence deliver most value whenever you need to make lots of similar decisions quickly. Good examples for this are:

  • Defining the price of a product in markets with rapidly changing demands,
  • Making offers for cross-selling in an E-Commerce platform,
  • Approving a credit or not,
  • Detecting customers with a high risk for churn,
  • Stopping fraudulent transactions,
  • …among others.

You can see that a human being who would have access to all relevant data could make those decisions in a matter of seconds or minutes.  Only that they can’t without AI or ML, since they would need to make this type of decision millions of times, every day.  Like sifting through your customer base of 50 million clients every day to identify those with a high churn risk.  Impossible for any human being.  But no problem at all for an ML model.

So, the biggest value of artificial intelligence and machine learning is not to support us with those big strategic decisions.  Machine learning delivers most value when we operationalize models and automate millions of decisions.

The image below shows this spectrum of decisions and the times humans need to make those.  The blue boxes are situations where analytics can help, but it is not providing its full value. The orange boxes are situations where AI and ML show real value. And the interesting observation is: the more decisions you can automate, the higher this value will be (upper right end of this spectrum).

automating_decisions_with_ML_and_AI

One of the shortest descriptions of this phenomenon comes from Andrew Ng, who is a well-known researcher in the field of AI.  Andrew described what AI can do as follows:

“If a typical person can do a mental task with less than one second of thought, we can probably automate it using AI either now or in the near future.”

I agree with him on this characterization. And I like that he puts the emphasis on automation and operationalization of those models – because this is where the biggest value is. The only thing I disagree with is the time unit he chose. It is safe to go already with a minute instead of a second.

Photo Friday: Sailing in Croatia

It’s photo Friday again.  I often will publish some of the images I took in the past years on Fridays.

Today I would like to share some pictures from a sailing trip to Croatia in 2009.  We first visited the Plitvice Lakes National Park which is one of the most beautiful places in the world.  The waterfall picture below was taken there.  From there, we drove to Murter to pick up our boat.  And then we sailed three weeks among the about 1200 islands of Croatia.

It truly was a great trip.  Here are the images I have selected for today:

K-Nearest Neighbors – the Laziest Machine Learning Technique

Family:
Supervised learning
Modeling types:
Classification, Regression
Group:
Lazy Learners / Instance-based Learners
Input data:
Numerical, Categorical
Tags:
Fast, Local, Global
5MwI_knn

One of my quirky videos from the “5 Minutes with Ingo” series. It explains the basic concepts of k-Nearest Neighbors N in 5 minutes. And has unicorns.

Concept

k-Nearest Neighbors is one of the simplest machine learning algorithms.  As for many others, human reasoning was the inspiration for this one as well.  Whenever something significant happened in your life, you will memorize this experience.  You will later use this experience as a guideline about what you expect to happen next.

Consider you see somebody dropping a glass. While the glass falls, you already make the prediction that the glass will break when it hits the ground. But how can you do this? You never have seen THIS glass breaking before, right?

No, indeed not. But you have seen similar glasses or in general similar items dropping to the floor before.  And while the situation might not be exactly the same, you still know that a glass dropping from about 5 feet on a concrete floor usually breaks.  This gives you a pretty high level of confidence to expect breakage whenever you see a glass fall from that height on a hard floor.

But what about dropping a glass from a foot height onto a soft carpet?  Did you experience breaking glasses in such situations as well?  No, you did not.  We can see that the height matters, so does the hardness of the ground.

This way of reasoning is what a k-Nearest Neighbors algorithm is doing as well.  Whenever a new situation occurs, it scans through all past experiences and looks up the k closest experiences.  Those experiences (or: data points) are what we call the k nearest neighbors.

If you have a classification task, for example you want to predict if the glass breaks or not, you take the majority vote of all k neighbors.  If k=5 and in 3 or more of your most similar experiences the glass broke, you go with the prediction “yes, it will break”.

Let’s now assume that you want to predict the number of pieces a glass will break into.  In this case, we want to predict a number which we call “regression”.  Now you take the average value of your k neighbors’ numbers of glass pieces as a prediction or score.  If k=5 and the numbers of pieces are 1 (did not break), 4, 8, 2, and 10 you will end up with the prediction of 5.

knn_concept

We have blue and orange data points.  For a new data point (green), we can determine the most likely class by looking up the classes of the nearest neighbors.  Here, the decision would be “blue”, because that is the majority of the neighbors.

Why is this algorithm called “lazy”?  Because it does no training at all when you supply the training data.  At training time, all it is doing is storing the complete data set but it does not do any calculations at this point.  Neither does it try to derive a more compact model from the data which it could use for scoring.   Therefore, we call this algorithm lazy.

Theory

We have seen that this algorithm is lazy and during training time all it is doing is to store all the data it gets.  All the computation happens during scoring, i.e. when we apply the model on unseen data points.  We need to determine which k data points out of our training set are closest to the data point we want to get a prediction for.

Let’s say that our data points look like the following:

data_set_math

We have a table of n rows and m+1 columns where the first m columns are the attributes we use to predict the remaining label column (also known as “target”).  For now, let’s also assume that all attribute values x are numerical while the label values for y are categorical, i.e. we have a classification problem.

We can now define a distance function which calculates the distance between data points.  Especially, it should find the closest data points from our training data for any new point.  The Euclidean distance often is a good choice for such a distance function if the data is numerical.  If our new data point has attribute values s1 to sm, we can calculate the distance d(s, xj) between point s to any data point xj by

euclidean_distance_math

The k data points with the closest value for this distance become our k neighbors.  For a classification task, we now use the most frequent of all values y from our k neighbors.  For regression tasks, where y is numerical, we use the average of all values y from our k neighbors.

But what if our attributes are not numerical or consist of numerical and categorical attributes?  Then you can use any other distance measure which can handle this type of data.  This article discusses some frequent choices.

By the way, K-Nearest Neighbors models with k=1 are the reason why calculating training errors are completely pointless.  Can you see why?

Practical Usage

K-Nearest Neighbors, or short: K-NN, should be a standard tool in your toolbox.  It is fast, easy to understand even for non-experts, and it is easy to tune it to different kind of predictive problems.  But there are some things to consider which we will discuss in the following.

Data Preparation

We have seen that the key part of the algorithm is the definition of a distance measure.  A frequent choice is the Euclidean distance. This distance measure treats all data columns in the same way though. It subtracts the values for each dimension before it sums up the squares of those distances. And that means that columns with a wider data range have a larger influence on the distance than columns with a smaller data range.

So, you should normalize the data set so that all columns are roughly on the same scale. There are two common ways of normalization. First, you could bring all values of a column into a range between 0 and 1. Or you could change the values of each column so that the column has a mean 0 with a standard deviation of 1 afterwards. We call this type of normalization z-transformation or standard score.

Tip: Whenever you know that the machine learning algorithm is making use of a distance measure, you should normalize the data. Another famous example would be k-Means clustering.

Parameters to Tune

The most important parameter you need to tune is k, the number of neighbors used to make the class decision.  The minimum value is 1 in which case you only look at the closest neighbor for each prediction to make your decision.  In theory, you could use a value for k which is as large as your total training set.  This would make no sense though, since in this case you would always predict the majority class of the complete training set.

Here is a good way to interpret the meaning behind k. Small numbers indicate “local” models, which can be non-linear and the decision boundary between the classes wiggle a lot. If the number grows, the wiggling gets less until you almost end up with a linear decision boundary.

knn_impact_of_k

We see a data set in two dimensions on the left.  In general the top right is red and the bottom left is the blue class.  But there are also some local groups inside of both areas.  Small values for k lead to more wiggly decision boundaries.  For larger values the decision boundary becomes smoother, almost linear in this case.

Good values for k depend on the data you have and if the problem is non-linear or not.  You should try a couple of values between 1 and about 10% of the size of the training data set size.  Then you will see if there is a promising area worth the further optimization of k.

The second parameter you might want to consider is the type of distance function you are using.  For numerical values, Euclidean distance is a good choice.  You might want to try Manhattan distance which is sometimes used as well.  For text analytics, cosine distance can be another good alternative worth trying.

Memory Usage & Runtimes

Please note that all this algorithm is doing is storing the complete training data.  So, the memory needs grow linearly with the number of data points you provide for training.  Smarter implementations of this algorithm might choose to store the data in a more compact fashion.  But in a worst-case scenario you still end up with a lot of memory usage.

For training, the runtime is as good as it gets.  The algorithm is doing no calculations at all besides storing the data which is fast.

The runtime for scoring though can be large though which is unusual in the world of machine learning.  All calculations happen during model application.  Hence, the scoring runtime scales linearly with the number of data columns m and the number of training points n.  So, if you need to score fast and the number of training data points is large, then k-Nearest Neighbors is not a good choice.

RapidMiner Processes

You can download RapidMiner here.  Then you can download the processes below to build this machine learning model yourself in RapidMiner.

Please download the Zip-file and extract its content.  The result will be an .rmp file which can be loaded into RapidMiner via “File” -> “Import Process…”.