The role of the data scientist

Recorded live at the Global Smart Energy Summit 2018 in Dubai.

Welcome to the latest DNV GL Talks Energy podcast series, which was recorded live at the Global Smart Energy Summit (GSES) 2018 in Dubai. Each week notable industry thought leaders join us to discuss the hot topics from GSES, and provide their insights into the main drivers behind the global energy transition.

The role of the data scientist

The role of the data scientist 

Artificial intelligence, machine learning and an increase in data are transforming the energy sector. In this latest episode of our podcast series, DNV GL talks to Norv Clontz, Director, Data Science Innovation at Duke Energy about the current role of the data scientist and how we can expect it will evolve in the future.

Large investments into the smart grid are giving rise to an increase in intelligent devices across the distribution network. We talk to Norv about the resulting growth of data available, and the benefits that this has for the consumer. Norv explains the impact that increased data and data analytics is having on utilities, and shares with us the challenges that this is causing. Finally, Norv gives his views on how he sees the role of the data analyst changing in the future, and what these developments might mean for the energy market around the Middle Eastern region.




Read the transcription here

NARRATOR Welcome to the DNV GL Talks Energy podcast series. Electrification, rise of renewables and new technologies supported by more data and IT systems are transforming the power system. Join us each week as we discuss these changes with guests from around the industry.  

MATHIAS STECK    Welcome to a new episode of DNV GL Talks Energy here from the Global Smart Energy Summit in Dubai. My guest this morning is Norv Clontz, Director Data Science Innovation of Duke Energy, North Carolina, USA. Good morning, Norv.

NORV CLONTZ    Good morning.

MATHIAS STECK    Norv, we want to talk about the importance of data science and the role of data scientists in business, going forward with the increasing importance of data. But before we do that, it would be great if you could explain to the audience what Duke Energy is doing and also, of course, introduce yourself.

NORV CLONTZ    Well, I’m Norv Clontz, and I lead the data science innovation team at Duke Energy. And we work within a larger analytics community. My team works on the advanced analytics use cases, things like the machine learning and inter-processing and video processing for object recognition, natural language processing, and other artificial intelligence–related fields.

Duke Energy is one the largest public utilities in the US. It’s got 7.5 million electric customers and 1.5 million natural gas customers, 50 GW of generation capacity, and 30,000 employes. It’s been in business, parts of it, over 150 years. Electric operations in six states. So as far as Duke Energy goes, that’s quick fast facts about Duke Energy.

MATHIAS STECK    Actually, I found a really interesting quote from you, which I read to the benefit to the audience. You said, between the power plan and the meter box are cataracts of data that the utility of the future must harness and navigate. So, how does a data scientist achieve this, and what do you do about this in Duke Energy?

NORV CLONTZ    We’ve had a lot of good successes with predictive analytics on the standard data sets for usage or for transmission, distribution, so on.

But in the recent past in the US since 2009, there’s been a really large investment in the smart grid, it’s called. And so, that means a lot of electronic sensors and intelligent devices put out across the transmission and distribution network. For example, instead the meters on the houses, which would deliver 1 meter read per month, which means 12 per year. Now we’re getting 35,000 per year. And so, with that type of data, we can do a lot more really useful things for the utility and for the customer. We can have an individual load profile, for example.

We can detect whenever there’s certain spikes in usage and maybe detect when an appliance is going bad for energy efficient marketing. And then we can also look at the demographics of the premise of itself and say, well, if you have a brick, single-family home, it was built in the 1980s and is 2,000–3,000 ft2, the normal home would have an electric bill of about $100. But yours is 200. So, perhaps your HVAC is failing. So, with that level of visibility into the actual interval usage of the data, we can do a lot more with it with quantitative modelling.

And so, one of the challenges we’ve had is to do with the volume of that data. And so, when the previous, when we had the previous data sets of 12 meter reads per year, we could do that on a laptop without having a whole lot of difficulty. But to handle 35,000 meter reads per year and then you look at a million customers over the course of a year, that’s 35 billion. So, we had to invest in some big data platforms and learn some new analytical tools that operate on those big data platforms, and also learn new techniques for how to deal with that type of data.

MATHIAS STECK    So, this is interesting development obviously, and I would like to talk a bit about the human factor in all this. There’s different perspectives to this, so, we hear a lot about big data. You mentioned artificial intelligence. We hear a lot about machine learning, learning about the behaviour of users. What happens to this whole human touch to this? The maybe instinct of people? And you can actually look at both sides from the service provider but also from the end user.

NORV CLONTZ    Yes, it’s a good question. But it’s true that those big sets of data, they don’t analyze themselves. We have to have… At this point, we can’t just ask for the great computer in the sky or something like on Star Trek. Computer. Earl Grey. Hot. We have to actually have skilled analytics practitioners to do the work, to do the modelling for it, and that does include an element of judgement and intuition.

There may not be an aha moment when they put a few things together and get a correlation. But they might say, well, that looks interesting. Maybe I should continue to investigate that. And it’s like Shannon’s law of, theory of information, is that the surprise is the information. And so, when you find that type of thing, you have an intuition or get an insight that you didn’t expect. And that’s when you apply the quantitative rigour and start doing the p values and the t tests and the r2s and the RMSEs and so on.

But yes, we still, it’s a science, and it’s all… Data science is a science, and it’s also an art. Because the human analysts who’s looking at the data has to make some decisions amongst all the variables and to identify the occurrences of covariants and eliminate those. And that’s a judgement call which is based on experience as well as intuition.

MATHIAS STECK    So, it’s interesting also in the sense of I was thinking about the, what researchers in the past did and they might’ve received a Nobel Prize for a very interesting finding after years of doing this. What do you think how will that change and how important will be data scientists, data analysts in that field of research? I think what I try to say, is it more about finding out about the what or is it also finding out about the why things happen?

NORV CLONTZ    Yes, that’s… Yes, they’re both important questions, and that’s one of the, one of the interesting things that we’ve come up with, come up against, as we’ve been developing artificial neural networks for certain different types of phenomena we’re trying to identify. And you can develop really accurate models with artificial neural networks, but you can’t explain why they’re doing it. So, if you’re a making decision based on the output of a neural network, you have to be very careful in case you ever have to explain or defend that.

So, it’s one of the, one of the odd things about it, is because you can’t really tell it… You can’t really understand why compared to decision tree around or random forest even. Those types of models, you can, you can see quite clearly why it came to the decision that it did. But artificial neural networks, especially the deeper you go, it’s just practically… It could be just overfitting. Yes, it’s really interesting.

MATHIAS STECK    You mentioned earlier parts of the organization of Duke Energy are about 150 years old. Now if you look about this outlook or what we have today already, but then also how that is going into the future… And that must’ve changed Duke Energy also in a very interesting way. So, what impact does all this data and the data analytics have on the industry? For example, for utilities?

NORV CLONTZ    Well, I think it’s really just getting started. We’ve been members of the Utility Analytics Institute, which is focused specifically on analytics applied within the utility space, not just electric but also multiple other utilities. And we’ve gotten a lot of value out of that. And one of the things we realized was there’s an awful lot of low-hanging fruit. There’s a lot of opportunities to apply analytics and data science and predictive models to a lot of the data sets that we got.

Even the data we’ve had for a long time hasn’t really been leveraged for a lot of the benefits that it could’ve been. And so, that’s why Duke Energy has taken the investment to establish an actual data science job family and career path and to hire quite a few. We actually got more than three dozen on staff now, and we’ve gotten a lot of benefits out of it. By using classification models and predictive models, we’ve been able to, with machine learning and other approaches, we’ve been able to identify cases of electricity theft and natural gas theft.

By looking at the change in the usage patterns, we can actually identify when people are actually stealing electricity. And the industry tells us that probably at least 1% of the electricity that’s generated and delivered is actually not paid for because it’s stolen. And so, for us, for Duke Energy, that’s more than a $200 million bogey.

So, we’re trying to get some of that back, and we’ve been very successful at it. And then similarly we have a lot of… Our operations are on the East Coast, North Carolina and South Carolina and Florida. In the last few years, they have been hit by several hurricanes and tropical storms. And some of the members of my team developed a machine learning approach, using all the historical hurricane data that we had and a bunch of other data about the weather and the operation centre and the distribution lines and so on, so forth. And the outage experience in the previous hurricanes. 

And they developed a model to predict the effect of a likely hurricane on the customers and on the operation centres. And it turned out to be so much more accurate than the current application that was in the place that was bought from a third-party vendor that the third-party vendor’s application was retired. And so, now we’re using the one that some of my teammates developed. And they’re using that on a go-forward basis and adding a lot of enhancements to it.

MATHIAS STECK    That’s a good story.

NORV CLONTZ    Yes, it’s already saved, well, millions and millions of dollars. I don’t know the exact figure, but it’s been a huge cost-savings success story.

MATHIAS STECK    So, based on what you just described, the power of data for decision-making but also to save a lot of money. If you look at the industry in general, maybe not only at Duke Energy, what would you foresee? How does the role of the data scientist, the importance, in a corporation, evolve in the future?

NORV CLONTZ    That’s a good question, Mathias. I think that the role of the data scientist will be larger and also will be lower. So, it’s going to be larger as the, especially as the dollar benefits from it become so much more obvious, both on the cost-savings side and the top-line-revenue side. And we’re also seeing a lot of opportunities in improving safety and customer satisfaction and other operational efficiencies. So, I think, because of that…

That’s the reason why we’ve had such strong growth in a number of data scientists at Duke Energy. We had the first one hired in May 2015, and now we have 36 at the last count. But I also think it will be lower because right now they’re highly skilled specialists. And so, it takes someone who can do coding in R or model development in Spark on a big data platform to get some really productive results out of it. But I think in the future, and not too far future, they’ll be IDEs and even vocal commands with natural language processing.

So that, the folks who might now be doing a role with a job description of business analyst, they’ll be able to be the citizen data scientists. And I think it’ll be a lot more accessible to get those data science–based insights out of the data that we have with some of the new tools that are likely to come down the pike.

MATHIAS STECK    So, unfortunately, slowly we’re coming to an end of this episode already, but I’ve two questions, which are a bit related. I would really like to know what you think are the hot developments for the energy market in general at the moment on the data science front. Maybe start with that first, and later we can maybe talk a bit about this region and how it applies there. So, what would you think? What is really the hot topics now?

NORV CLONTZ    I think there’s still a lot of momentum picking up with applying data science inside the utility space. We talked a lot about utilities, and they’re trying to do some of the same things that we’re doing with some of the same topics. There’s a lot of opportunity to do asset management, failure prediction, and condition-based maintenance. There’s tons of money that can be saved so that you don’t have some catastrophic failure of equipment and you’re only doing a calendar-based inspection maintenance routine.

If you can actually have a smart device on it, with a lot of the smart grid devices we put out there, we can look at the usage rates and the temperature and vibrations and so on potentially and identify the failures before they happen. So, actually, applying data science across all the different opportunity areas inside the utility is big. It’s really big opportunity; and then separately, technology. A lot of the utilities since 2009 have put out the smart grid devices. And they’ll stream data back in real time sometimes.

If you look at power quality meters, they can have… Some of the advanced ones have 256 samples per cycle, and 60 cycles per second. So, in a minute, you got a million samples potentially. And so, what that means is the old-fashioned way of having a relational database management system or a data warehouse, those don’t work with that kind of data. So, we’re looking at things like… Of course we put in a big platform based on Hadoop, and we’re having to go up the learning curve with that.

But even that can’t handle that type of data. If, even though we could load that type of data on to it, just the backhaul telecoms costs would be prohibitive. So, we’re starting to investigate doing the analytics out at the edge. And we’re going to have to get better and better at that because more and more devices have IoT-, Internet of things, type sensors and monitors on them.

MATHIAS STECK    So, coming back to what I said earlier, we are here on the Global Smart Energy Summit in Dubai and what we discussed, of course, very interesting. But looking now specifically here in the region, is there anything, well, No. 1, what do you think how this will develop here, and what is your anticipated learning you want to take back from this conference?

NORV CLONTZ    Yes, I haven’t been around the conference yet, so I can’t say. But I’m really impressed by how large it is, and I’m looking to see what are some of the analytics and data science use cases that have been applied out here, and if they are or not. I’d be interested in finding that out. And then separately I think the opportunities for applying renewables, implementing renewables infrastructure here compared to the way we try to do in the US would be very interesting to find out as well.

MATHIAS STECK    Thank you very much, Norv, for this interesting conversation and your insights into data science. And thank you very much to the listeners. That was Norv Clontz, Director Data Science Innovation, here on the Global Smart Energy Summit in Dubai.

NARRATOR  Thank you for listening to this DNV GL Talks Energy podcast. To hear more podcasts in the series, please visit