Much of the conversation around AI advancements in today’s headlines seems to revolve around the evolution of the tech itself, ie making it smarter, faster, more human, etc. But the future success of AI is predicated on more than just building a better algorithm. AI evolution can’t just be a series of siloed efforts to improve the technology. We have to explore other factors that may have an impact on how that tech functions and operates, and what role those factors play in a more holistic solution. For instance, we must decide whether AI can co-exist with our current systems, or if we need to rebuild those systems to allow AI to flourish. To make that call, we need to consider several important questions:
Will we have the right data? Will we have the necessary infrastructure in place? Will we have the requisite expertise?
1) Will we have the right data?
We can’t talk about AI without talking about data – it is absolutely essential to AI’s success. But it’s also a complicated science that will undoubtedly encounter obstacles as AI continues to evolve.
Structured vs unstructured
It’s estimated that by the year 2020, the Internet will have accrued close to 47 zettabytes of data. Wondering what the heck a zettabyte is? A single zettabyte contains one sextillion bytes, or one billion terabytes. That means it would take one billion one terabyte hard drives to store one zettabyte of data. Because the zettabyte unit of measurement is so large, it is only used to measure large aggregate amounts of data. (Source) Makes sense now, right? Suffice it to say that is a ton of data. The problem is that a large portion of that data (90%) is unstructured–meaning it’s not organized, tagged or labeled–rendering it useless for machine learning until it’s been properly “cleaned.” Frequent data issues include duplication, inconsistency, omissions, quality issues and format incompatibilities, to name a few. Consequently, data scientists report spending more time (80%) finding and organizing data than actually building new algorithms and mining for patterns. As time goes on and we accrue more and more data, it’s imperative we find a more efficient process so these scientists can focus on the data itself and the information it has to offer. One solution is to employ the use of data catalogs to help identify and label information as it comes in, rather than “fixing” it retroactively. Or maybe we need to invest in new solutions that will help analyze unstructured data better/faster/cheaper.
Training vs operational
Another consideration is whether or not certain data will remain useful over time. Not all data is valuable for AI–especially after it’s already been fed to the machine. In fact, the data used as input to teach an algorithm is often referred to as training data. It’s the information used to make it intelligent. But to keep it intelligent is another story. Once training data has been successfully used to inform an algorithm, there’s not much else that historical data is valuable for. But for the machine to continue to learn and improve, it needs operational data–essentially new, ongoing data obtained by daily business operations. As the founder of the Creative Destruction Lab, Ajay Agrawal, puts it, “if you can find ways to generate a new, ongoing data stream that delivers a performance advantage in terms of your AI’s predictive power, that will give you sustainable leverage when AI arrives.”
Small data vs big data
“Big data” is the phrase we hear most often in conjunction with AI and machine learning, or ML, technologies. After all, the bigger the data the better, right? In ML, this makes sense. At its most basic function, the algorithm is searching for correlations. A wider selection of data means more “proof” of a correlation, and a more accurate algorithm. For example, if you’re trying to teach an algorithm to recognize a certain image, you have to “train” it with several million images. So the answer is more data. Right? Wrong. Big data is pretty high maintenance–it’s costly and time consuming to collect, organize, maintain, and keep secure. And because it’s time and context dependent, staying up-to-date requires constant and ongoing accumulation. In essence, big data begets bigger data, and so on. Which is not sustainable for the future. That’s why some startups have already begun building algorithms that can learn more from less data.
Diverse data
Because AI can only learn from the data it’s fed, it shouldn’t come as a surprise that this technology has experienced some serious bias problems. After all, when our ever-present human bias is baked right into the AI’s “food,” the algorithm can’t help but reproduce it. Researchers in this field have hypothesized and tested a litany of possible fixes, including “algorithm audits” carried out by experts and academics, as well as relevant advocacy groups. But to really get to the root of some of these issues, I think we need to get serious about diversifying the actual workforce tasked with collecting and mining this data, as well as those building and refining the algorithms. If we want to ensure that future AI isn’t simply re-producing the past, we need to make data diversity a priority now.
2) Will we have the necessary infrastructure in place?
No matter how far we’re able to advance AI technology, it will never reach its full power and potential if we don’t have the underlying systems and infrastructure to support it. And I’m not just talking about updating software (although that will play a large part). This infrastructure also refers to the physical world–especially as more and more advancements are made in “Smart” technology for the home, the automobile industry, and the larger Internet of Things (IoT). It also includes any potential regulatory framework that may need to be established.
The digital world
Let’s start with the software. Lenny Pruss, partner at Amplify Partners, goes into epic detail on the systems evolution needed for the next generation of tech in his article for VentureBeat on “Infrastructure 3.0.” But in summary, the old hardware just isn’t going to cut it. AI and ML are radically different from the legacy software of days past, and require completely different programming and coding. As a result, we’ll need “new abstractions, interfaces, systems, and tooling to make developing and deploying intelligent applications easy for developers.” That’s not to say we have to halt progress on AI algorithms and applications. But their evolution needs to happen parallel to these radical, foundational changes. The tech is only as good as the system it’s operating on.
The physical world
In early 2018, a self-driving Uber car struck and killed a pedestrian crossing the street in Arizona–the first reported fatality at the “hands” of an autonomous vehicle. While the human safety driver behind the wheel was reportedly distracted and not properly monitoring the system, the major takeaway from this tragedy was that self-driving cars “just aren’t ready for the road yet.” But maybe the roads aren’t ready for self-driving cars. Why spend all this money on improving the technology behind autonomous vehicles if we’re just going to put them on the same dangerous roads we often encounter on our own commutes? Several conceptual designs have been completed for so-called “smart streets,” but there doesn’t appear to be any concrete road work in the pipeline. And if driver and pedestrian safety have always been a priority shared by both automobile and road design in the past, it shouldn’t be any different in a driverless future. We’re seeing similar discussions come up around the evolution of “smart homes.” We started with the explosion of smart devices like Nest and Google Home that worked to connect and control existing home appliances and systems like HVAC and lighting. But problems arose with the tech’s connectivity, compatibility and quality. I’m sure there is a wide margin of improvement on the tech for those devices, but it makes more sense that product engineers have now started building smart technology right into new appliance designs. Sure, this solution may have its drawbacks, but in the specific case of the tech–it always works better when it’s set up to succeed.
Democratization of AI
If you’re familiar with the issue of AI democratization, you’ll know that it isn’t a simple topic to tackle. Nor is it a silver bullet for certain AI problems. But I’d be remiss if I didn’t touch on it briefly as a future consideration. I’m sure by now you’ve read at least one news headline or article warning of the potential dangers of AI. Maybe even a few of your peers have expressed reservations. There’s a lot of fear and uncertainty around the future of AI in today’s society. And understandably so. Most of what the general public knows about artificial intelligence comes from the “bots-gone-wild” sci-fi narrative that has notoriously pervaded popular culture. And even when we look to high-profile industry leaders for reassurance, we’re occasionally met with alarmist views like those of Tesla CEO, Elon Musk–who famously warned that AI has the potential to “doom mankind.” But while Musk may have initially engaged in a bit of unhealthy fear-mongering, his most recent views and efforts preach caution around a very non-fictional threat: The abuse of AI’s power. In an interview with YCombinator founder, Sam Altman, Musk said, “It’s not that [AI] will develop a will of its own right off the bat, the concern is that someone may use it in a way that’s bad.” Musk, along with other industry leaders, is advocating to make AI technologies widely available to all businesses and individuals at an affordable cost. So control doesn’t belong to the 1%. Although progress in this effort has been somewhat uninspiring to date, I do think it’s important as we consider the future AI framework.
3) Will we have the requisite expertise?
According to research from Element AI, there are fewer than 100,000 specialists in the entire world today who have enough competence and expertise to solve serious AI challenges. And only about 22,000 PhD-educated researchers actively working in the field. This poses a severe threat to the future of AI technology.
Teaching expertise
Big tech companies have started pulling experts from professorial positions at various colleges and universities around the world–unfortunately diminishing the number of people who can teach the technology to future generations. Make no mistake: This is a problem. While the basic concepts of neural networks and deep learning may only require a high-school math level, real expertise takes more significant math skills–akin to those of physicists and astronomers. Luckily, companies like Google and Facebook have started offering AI engineering classes and online courses, while smaller companies have resorted to hiring outside of the specific field. If we want to ensure that artificial intelligence does not surpass that of humans, we had better start investing in our own education.
Sharing expertise
We’ve already discussed the advantages of the democratization of AI technology, but I think it’s equally important in light of the above reality that we also break down any existing silos of AI expertise while we still can. Silicon Valley is an excellent example of how beneficial this practice can be. It’s not unusual for researchers and developers to bounce around the industry’s leading companies, uploading new skills and knowledge, and in turn downloading them back into their next role. As a result, both companies and employees get smarter–avoiding a problematic monopoly of AI expertise (and power). It’s clear changes are in order. As an AI-powered world becomes more of a future reality rather than a hypothetical possibility, the conversation has to shift from adoption to accommodation. The question now is not, “Is AI ready for our world,” it’s “Is our world ready for AI?” And when we consider these inevitable–and in some cases substantial– gaps in data, infrastructure and expertise, the answer is unequivocally yes. We need to start preparing now if we want to avoid the pitfalls of a technology we can’t fully support–or worse, control.