Energy supply companies are becoming aware of the need to be customer-focused to build better relations, loyalty and increase sales. Distribution network operators are also realising the need to become more customer-centric in order to be pre-emptive in network design rather than reactive. Understanding household level demand and how it is changing (especially with increased low carbon tech uptake) opens the channels for improved customer service.
Potential smart storage solutions (batteries) which offer a promising way to mitigate peak demand, producing a more efficient grid, ideally require forecasts down at the household level. However, individual household energy usage is highly erratic with sharp peaks occurring over short timescales and at different times of day on different days of the week. Traditional forecasting methods, typically applied at high voltage over thousands or millions of households and commercial premises where the profiles are much smoother, provide a very smooth forecast. However, this is not sufficient or realistic for an individual household and is of limited use to energy storage solutions.
A simple averaging method will mollify peaks that occur at slightly differing times, resulting in a poor forecast. But using just a previous week (LW) will result in a highly uncertain prediction.We have developed our own forecasting methodology (AA) that allows us to average over several days still maintain peaks, even if they occur at different times of day. Thus providing a realistic forecast while also producing a more accurate forecast with less uncertainty.
The chart below shows an actual electricity usage over the course of a day (Actual), also plotted is a flat forecast (which achieves a higher accuracy under standard measures than an average forecast), a forecast that is just to take last week's actual usage (LW), and finally the forecast based on our adjusted averaging method (AA) that is able to mimic the peaks but still maintain accuracy.
•In the academic/mathematical literature on community detection, a community within a social network is taken to be a group of users with two characteristics:
The community is densely connected internally, i.e. people within the same community talk to each other a lot.
There are relatively few links crossing from the community to the outside world, i.e. people talk to fellow members of their community more often than they talk to non-members.
•Community detection is the task of finding communities within a dataset.
•We can use community detection to uncover structure in a large digital messaging dataset
-this gave a way to navigate the large messaging dataset and split it up into meaningful, manageable pieces.
We combined four community detection algorithms:
-the Louvain method, a weighted variant of the Louvain method, the “k-clique communities” method, and second neighbourhoods
•We used these algorithms to generate 98,078 candidate communities
-this gave us a wide range of candidates to choose from – all reasonably tight-knit but not just the mathematically best ones.
•For each candidate we evaluated a range of statistics
-E.g. size (number of users), average sentiment level, topics of conversation, frequency of members’ participation
-contribution of recently registered users, conductance, a weighted version of conductance.
•Using that broader range of statistics, of which conductance was just one, we chose 18 communities of interest.
The image below shows an example of our community detection in action. A standard technique would have placed the two angry-faced people in the top-right with the community in the top-right. Instead, our algorithm puts them in the same community as the other angry-faced individuals.
Understanding Potential Terrorist Networks
In this older case study, we show a recent method for analysing a large evolving network and have demonstrated how such a similar analysis might be useful in the fight against terrorism and organised crime. We demonstrate that we can calculate dynamic, evolving communicability measures for a social network of some 800,000 people containing 4.5million communications. We can infer the most powerful people in the network through measures of their potential influence over the whole network. We can also spot emergent influencers and people gradually rising in power early on.
The key different with our approach is that we take account of causality due to the flow of time. A standard static network analysis that simply looks at who has contacted who (i.e., who is friends with who) misses out on a whole wealth of information captured in the timings of the communications. The image below is a classic simple example. The left box represents the communications on Day 1 and the right box on Day 2. A traditional analysis would say that C is a key player since he/she sends and receives many messages. However, C could not have got a message through to A. On the other hand, if A says something interesting to B or E on Day 1, they could pass that message on to C and D on Day 2. Thus A has the potential to have sent a message to the whole network. In our measure A has the highest broadcast score.
An influencer can have a very high broadcast score (and hence high potential influence) without actually sending out very many messages. What is key is how people respond to your messages – do they pass them on or just ignore them.
Forecasting launch sales of a new product
Disaggregated retail sales data has been underexploited in two significant and investable fields:
Inference: using the data to infer why customers will choose one product over another, on some occasions, and not others, so as to optimise category and ranging decisions – such insight might be shared by the retailer with CPG suppliers (wining “share of mind”), or used to position their own label products better;
Forecasting strategic changes, new product development, and range change scenarios, which are often one time decisions for which no immediate parallel experience exists in the past; such scenario forecasts must model behaviour and extrapolate the consistent and inconsistent behavioural traits of the customers rather than just their actual past purchasing.
Here we present analysis of a test case, investigating such an application to the launch of Coke Zero. By modelling the purchasing reasons within a category we are able to make forecasts for the launch sales of a new product under given marketing assumptions. The chart on the left shows the actual sales and the forecast sales for various products including Coke Zero using this agent-based modelling approach. Forecasting the launch sales of a new product is notoriously difficult and so even being in the right ballpark is a good result. To be as close as our model (an accuracy with an error of just 18% for Coke Zero sales over the first 4 weeks) is a major achievement. The chart on the right provides an automated breakdown of the purchasing reasons of the customers based on the model.
Customer segmentation provides many commercial advantages. Segmentation allows you to inform and market to people in a tailored and targeted manner with the right message, right tone, at the best time and through the best medium of communication to provide increased engagement, loyalty and value. You can also make more personalised offerings, for example, tailoring the web site to the customer segment. You are able to provide more insight into company achievement by breaking performance down by segment and thus identifying where the market and your company are headed. You can also use segmentation for market testing and sampling. A representative sample of customers needs to be representative across all customer segments, or alternatively, appropriately scaled where the sample is biased.
There are many ways to segment a customer base: geographic, demographic, lifestyle, RFV, etc. but it is only when companies invest in a truly behavioural segmentation that you reap the real benefits. This may seem obvious – it is how the customers actually interact with the company and its products/services that the company wants to measure and influence. However, it is only in the digital era that the sort of interaction data required (purchase history, product/service use, website views and searches, social network activity, etc.) is routinely available to allow companies to become truly customer-centric. Unfortunately, many companies continue to operate as if they are still in a pre-digital non-customer-centric culture.
Not only are there many types of customer segmentations but there are several different segmentation techniques to choose to implement: for example, K-means, Chaid, Density-based clustering (e.g., Dbscan). However, in most situations our method of choice for customer segmentation is the statistically-based finite mixture model. This method has many advantages: (i) it is objectively data-driven rather than arbitrarily making a cut at say 3 years of tenure or £50 of spend, (ii) it combines multiple attributes together and can account for correlations between them (as opposed to Chaid, which treats each attribute in turn), (iii) unlike K-means and Dbscan, there is no need to a priori define an overall distance measure or weighting for each attribute because the algorithm discovers the optimal weighting of how important each attribute is; and what’s more it discovers optimal weightings for each segment rather than just one overall weighting, (iv) category variables – i.e., attributes that cannot be ordered, such as mobile device type used, can fit naturally into the framework, and (v) the probabilistic approach provides an estimate for how strongly each customer is associated with each segment – thus, for example, if you wanted to run a marketing campaign to 50,000 customers that’s targeted towards Segment X, which only has 35,000 customers, then you can identify the next 15,000 customers most closely associated with Segment X. Alternatively, you could market to the 10,000 customers who are really most core to Segment X out of the 35,000 customers therein.
Customer Segmentation Continued...
We recently ran a customer segmentation project for a cashback website. Again, we chose the finite mixture model approach for the reasons outlined above. For obvious commercial reasons, we cannot show the segmentation we actually produced, but the two plots below uses the actual customer data and the segmentation we would have got had we just used four attributes of the data. The attributes used for each customer are (i) the number of transactions by that customer (not shown) and (ii) the fraction of those transactions into each of the three groups of merchants that we formed. The plots are a projection from those four attribute dimensions onto the plane represented by the fraction of transactions in each of two merchant groups – hence the triangular shape since the sum of the fractions to all three merchant groups must sum to 1. Note that some of the segmentation takes place in the other two dimensions not drawn here, which produces the apparent slight overlap in the segments. In each plot, each point represents a customer and is coloured by the segment that customer belongs. Note this is a projection down on to just two attributes – other attribute dimensions are not shown and hence the apparent overlap of customers in some segments (if it were possible to show the other two dimensions this overlap would disappear).
What is also striking is that the k-means method has no local definition of distance and thus all segments take approximately equal physical size in attribute space. The finite mixture model, however, is able to produce a smaller physical segment for the blue segment given the high density of users in that part of attribute space. It is also able to suppress the relative importance of the other two attribute dimensions not shown specifically for the red and green segments, giving these segments better definition in the attribute dimensions that are most important for them.
Even in this simple model with just 4 dimensions and 4 segments, over 40% of the customers would end up in a different segment under the standard k-means method compared with the more sophisticated finite mixture model. With higher dimensions and more segments the problem gets even worse, and even more so if the input data has not been filtered and transformed optimally. This means that with a weaker segmentation, over 40% (and much more) would likely be receiving the wrong marketing message, or at least one that isn’t optimised for them. The effects of which can be to make the customer more disengaged rather than more engaged.
This is allowing the cashback website (i) to produce more effective customer relationship management (CRM) so campaigns are much more relevant, (ii) to drive better customer insight and understanding, and (iii) to help provide personalised marketing and web page personalisation.
By using a better segmentation, you create much greater customer engagement, loyalty and value through more targeted marketing, personalised offerings, more relevant reporting, more accurate testing and better predictions/forecasts/understanding of customers.