The progression in AI technology endures to empower users in crafting and benefiting from more intricate content. The pioneering force behind this revolution is generative AI, unraveling novel perspectives and possibilities, infusing a heightened level of ingenuity into our daily lives, now more accessible than ever. The user experience metamorphoses as AI-generated music performances, artwork, and text-based documents, along with AI-assisted programming and code generation, take center stage.
Emerging trends in generative AI development primarily revolve around natural language processing (NLP) models like ChatGPT, and text-to-image generation models, such as Midjourney and DALL-E. Through these innovative solutions, what was once deemed unattainable is now well within our grasp. Remarkable visuals can be generated based on textual descriptions, and authentic conversations can be simulated through uncomplicated chatbot interfaces. The scope of possibilities expands far and wide, allowing us to delve into new reservoirs of creativity through the collaboration of machines and humans. The potential of generative AI remains in its nascent stages, ripe for exploration.
Generative AI is based on Transformer
The foundation of generative AI is rooted in the Transformer model, a ground-breaking AI model in natural language processing introduced in 2017, paving the way for generative AI, outshining existing solutions like convolutional neural networks (CNN) and recurrent neural networks (RNN) in terms of accuracy and quality, even extending its reach to vision and voice domains in 2020.
Since 2021, MediaTek’s AI Processing Unit (APU) has undergone many optimizations to handle Transformer models, enabling smartphone brands to introduce vision and voice applications. Collaborating with leading mobile companies, MediaTek proactively enhances the smartphone user experience through AI apps which utilize Transformer models.
MediaTek APU and NeuroPilot: Ready for Transformer
MediaTek’s AI Platform, NeuroPilot, presents a comprehensive solution for deploying Transformer-based AI applications. It overcomes the intricate computational flow associated with these models while leveraging MediaTek’s APU design to reduce DRAM bandwidth, ensuring optimal system-on-chip (SoC) performance and power efficiency. NeuroPilot is composed of a suite of integrated powerful tools aimed at streamlining the development and deployment of AI models, facilitating end-to-end execution of Transformer AI models on the APU. Through NeuroPilot, developers are able to possess the necessary resources to create cutting-edge Transformer-based applications swiftly and effortlessly.
MediaTek NeuroPilot empowers manufacturers to leverage the proven capabilities of Vision Transformer (ViT) and Voice Transformer through the APU.
Real-world implementation and advantages
Real-world implementation of these advancements result in the form of the vivo X90 Pro, an extraordinary new smartphone fueled by MediaTek’s Dimensity 9200, the latest flagship 5G smartphone chip that uses the new MediaTek APU 690. This device sets a new standard for mobile photography and voice recognition by harnessing the innovative potential of Vision and Voice Transformer technology.
Using ViT technology, the vivo X90 Pro is able to deliver unmatched precision in object segmentation, as well as for facilitating precise adjustments and corrections at the object level, thereby vastly enhancing low-light photography. ViT technology excels in accurately isolating individuals from their backgrounds (portrait capture), even capturing intricate details like hair, and subsequently applying real-time background filters to create stunning special effects, setting the X90 Pro apart from competitors in video capture and live-streaming.
The Dimensity 9200 platform boasts Transformer-based Voice AI, empowering on-device automatic speech recognition features, drastically improving response speed and safeguarding user privacy by eliminating the need for Cloud-based processing. This cutting-edge technology represents the first instance of optimized Transformer Voice AI models for mobile APUs, resulting in a 30% reduction in power consumption and a 50% enhancement in performance compared to the previous-generation CPU solution.
Emman has been writing technical and feature articles since 2010. Prior to this, he became one of the instructors at Asia Pacific College in 2008, and eventually landed a job as Business Analyst and Technical Writer at Integrated Open Source Solutions for almost 3 years.