How do you develop a machine learning algorithm to find the construction of new crude oil storage tanks around the world?
In today’s blog, we dig into this question with someone who knows the answer: Michael Allen, data scientist at Ursa Space. Michael’s past work experience in remote sensing (NASA) and his education (PhD from University of California, Santa Barbara) served him well on this technically-demanding project.
Something else he leaned on was Ursa’s extensive satellite imagery archive and expertise in processing synthetic aperture radar (SAR). These resources were necessary to train the model he developed during the exercise, which serves two purposes.
Global oil storage is a closely-watched economic indicator, which is why Ursa monitors over 20,000 tanks weekly and packages the measurements into an off-the-shelf dataset. Maintaining a comprehensive dataset requires capacity updates because new tanks are being constructed, particularly in China, where transparency can be lacking.
That’s why satellite imagery is used. More to the point, scanning imagery for signs of new tank construction around the world cannot be done manually. It must be automated.
Also, the same algorithms can easily be tweaked to apply to different use cases. The applications are wide-ranging and go beyond oil storage tanks.
What’s the starting point for a process like this?
Michael Allen: Over the years, we’ve built up a substantial database of satellite imagery containing oil tanks all over the world. You can train a neural network on these stacks of imagery, and teach it to recognize features that make a tank a tank. The same imagery also contains a lot of other things, besides tanks, which creates some difficulties, but there’s an upside because the algorithm can keep improving. It learns to correctly identify tanks even when there are other objects present that are roughly similar.
As you start to train the algorithm to find tanks, you’re creating even more data that you can use, so the process becomes circular, allowing the algorithm to get better and better as you continue to train on it. It’s a positive reinforcement loop.
You’re referring to Ursa’s oil storage measurements. So basically the satellite imagery collected can be repurposed?
MA: The key point here is that we’re leveraging satellite data that we already have. We’ve already marked individual tanks in thousands of locations across the globe. We use this data to train a neural network to be able to recognize them. You don’t have to find every pixel that contains the tank. The model doesn’t have to be fine grain; it just needs to find features that are likely tanks, and calculate the probability that this is correct.
Next up is a quality control process, in which the images with a high probability are passed on to a technician, who either confirms that they contain tanks or flags them as ‘false positives.’
Automation is key because it allows us to do things like scan the globe for new oil tanks, but there is an important aspect of any machine learning operated process of having a visual inspection at the end.
Okay so you’ve identified what you believe to be a tank. It might be a new tank to us, though how do you know if it’s actually newly constructed or not?
MA: With a single SAR image, you can identify oil storage tanks, but you won’t know if these are new or not. We cannot say when they were constructed. To figure that out, we need a reference image, collected far enough in the past to be used as a baseline. Subsequent images are collections of the same spot every couple of weeks, and compared to the reference image using the algorithm. In doing so, we basically construct a timeline detailing when changes began to occur related to tank construction.
What are some of the challenges you encountered?
MA: Oil tanks are constructed in busy areas with a lot of activity going on. There’s vehicular traffic, and often other construction that could involve pipelines, refining equipment, worker housing. You also may have materials transferred to and away from the site. And then storage tanks are often built near coastal ports, so you may have ship traffic coming in and out. Much of this can appear roughly similar to oil tanks, though the algorithm must be able to separate the two.
Is there a good example that comes to mind?
MA: One of the best test cases that we found was Dalian, China, about five years ago. A new refinery was built, which meant lots of construction in the area. There were pipelines, refining elements, housing, and it’s along the water, so there were all sorts of changes happening. From a construction perspective, it was very busy, plus there were already storage tanks in the region. We were testing our ability to identify new construction and separate out new oil tanks from existing ones.
Thanks, Michael. We look forward to hearing more about this and other exciting projects you’re working on.