There are always many "small gaps" in life: when children play with electronic devices alone, they lack an early education partner for dialogue and interaction; when adults work overtime late at night, they finally sigh with relief but have no one to share it with; even when they want to add a personal touch to their desktop, most are static ornaments that can't move or "speak".
This is the original intention of our developing the AI desktop pet: not to make a simple smart toy, but to create a "warm desktop companion" for users of all ages. But what technical support is needed behind to achieve such a "sense of companionship"?
This column will comprehensively interpret the technical code of Shifeng AI toys. This time, we focus on the upcoming product - AI desktop pet, and also invite Luo Zhiling, CTO of technical partner oxtron-AI, and Yu Gangpeng, Senior Development Engineer of Shifeng Culture's Product Development Department, to jointly reveal the technical details of Shifeng AI desktop pet and see what makes it different.
(Image: AI Desktop Pet)
I. Core Product Information
• Product Name: AI Desktop Pet (Intelligent Interactive Type)
• Core Positioning: Through voice interaction and motion-expression linkage technology, create an intelligent desktop device with emotional companionship, interesting interaction and personalized customization on the desktop, to meet users' needs for entertainment, companionship and personalized expression.
II. Introduction to Technical Partner: oxtron-AI
As a leading Chinese service provider of AI hardware and software solutions for toys, oxtron-AI builds an "AI + Toys" ecosystem. It is committed to injecting AI capabilities into toy manufacturers through cost-effective hardware and software solutions, making toys more interesting and fun, enhancing interactivity, and realizing personalized companionship and progressive intellectual education.
III. In-depth Interview on Technology and Products
(I) Technology Architecture Chapter: Interview with Luo Zhiling, CTO of oxtron-AI
Q1: Shifeng Culture's AI desktop pet takes "voice + body" multimodal interaction as its core selling point. How does its underlying technical architecture support the product's differentiated competitiveness?
The "able to listen, speak, move and understand" multimodal interaction experience of Shifeng Culture's AI desktop pet is built based on the full-stack technical architecture of "hardware perception layer - intelligent interaction layer - cloud-edge collaboration layer". It constructs core competitive barriers from three dimensions: perception accuracy, interaction intelligence and response efficiency, laying a technical foundation for the product's market breakthrough.
◆ In the hardware perception layer, the multimodal perception matrix constitutes the characteristics of the desktop pet: the microphone array improves the voice anti-interference ability of AI toy products, solving the problem of voice collection in noisy environments. The design of multiple component sensors realizes the dual guarantee of attitude perception and anti-fall function, expanding the safety of product use scenarios; at the same time, the AI desktop pet adds touch sensing function, replacing voice operation with body touch to improve the convenience of user interaction.
◆ In the intelligent interaction layer, the core is the jointly developed "long-term and short-term memory agent engine", which expands the interaction mode: on the one hand, it has the ability of dialogue history memory, which can associate users' past interaction information (such as remembering users' preferred animals and story themes), realize "personalized context response", and strengthen the emotional connection with users; on the other hand, it can automatically plan multimodal feedback based on the dialogue content (for example, when the user expresses "happiness", it simultaneously triggers a smiling face expression on the screen and brisk sound effects), improving the sense of interaction immersion.
◆ In the cloud-edge collaboration layer, a "hybrid model engine + local priority response" architecture is adopted: AI capabilities such as speech recognition (ASR) and speech synthesis (TTS) are mixed and called through "lightweight model + top model", realizing the balance of "fast response to simple questions and in-depth answers to complex questions", taking into account user experience and cost control; local wake-up and audio processing technology ensure second-level wake-up in low-power state, getting rid of network dependence, and further optimizing the real-time interaction experience.
Q2: Which technologies are applied to AI desktop pets for the first time?
This AI desktop pet project promotes the product upgrade from "functional toy" to "emotional companion" through four industry-leading innovative technology applications. The core breakthroughs focus on three dimensions: "interaction realism", "autonomous decision-making ability" and "memory personalization", which significantly enhance the product's differentiated competitiveness.
1. The industry's first fine-grained linkage technology between microexpressions and body movements. Different from the rigid mechanical movements of traditional toys (such as single nodding and shaking the head), we realize smooth body control through a small steering gear array, and innovatively add micro-movements such as "simulated blinking", which greatly improves the "sense of life" of the product and strengthens users' emotional identification.
2. The first autonomous interaction decision-making system based on multi-sensor fusion. Breaking through the technical limitation of single infrared obstacle avoidance of traditional toys, it integrates gravity sensing, touch recognition, infrared/TOF data to build a multi-dimensional perception model: it can accurately judge users' action intentions (such as distinguishing "playful shaking" from "accidental falling"), realizing scenario-based intelligent response; superimposing sound source localization technology, supporting anthropomorphic interaction of "automatically turning the head to the direction of the user's voice", further improving the naturalness of product interaction and expanding the adaptability of multiple scenarios such as family and early education.
3. The first intelligent agent technology with hybrid long-term and short-term memory. For the first time, it realizes the dual memory fusion of "long-term user preferences + short-term dialogue content" in AI desktop pets: long-term memory of users' core preferences (such as colors and story types), and short-term memory of recent interaction information (such as associating "experiences mentioned yesterday"), upgrading the product from "passive response" to "active context-related" personalized companionship, significantly enhancing user stickiness and building the core differentiated barrier of the product.
4. The first NFC activation system. By wearing different "theme accessories" (such as "backpack" and "hat"), different attribute values of the AI desktop pet can be added, thereby improving the playability of the product during its life cycle.
(II) Product Value Chapter: Interview with Yu Gangpeng, Senior Development Engineer of Shifeng Culture's Product Development Department
Q1: Compared with ordinary desktop pets or real pets, why develop an AI desktop pet that can interact and move intelligently?
Traditional electronic desktop pets are often limited to simple animations on the screen, lacking physical interaction; while real pets can provide warm companionship, they need feeding, cleaning and other care, which not only requires careful care but also costs a lot of money every month. Faced with this "dilemma" demand gap, we have created this AI desktop pet that integrates "digital soul" and "physical body". It can realize realistic interaction through voice, actions and expressions, and users do not need to bear the pressure of feeding. It is a lighter and more sustainable emotional companionship solution.
Q2: There are many emotional companionship products in the market. What additional functions does the AI desktop pet have compared with other products in daily use scenarios? What outstanding use value does it have?
The difference between Shifeng Culture's AI desktop pet is that it is not only a device that responds to instructions, but also a desktop partner with a "sense of life". Its core advantages are reflected in three characteristic functions:
1. "Easter egg-style" hidden interaction: In addition to active voice interaction, "easter egg-style" hidden interaction is specially set. Users can trigger "pet easter eggs" through independent exploration, and the AI desktop pet will randomly play interesting animations; at the same time, when it detects that the user has been focusing on work for a long time without activity, it will "yawn and stretch" to remind the user to stand up and move around; in leisure time, it may enter the "self-amusement" mode, such as chasing its own shadow (simulated by chassis movement). This sudden "easter egg-style" interaction not only reduces the mechanical sense of electronic products, but also increases the fun of the AI desktop pet.
2. NFC magnetic suction accessory system: An NFC magnetic suction accessory system is designed, and users can change decorations such as bows and little stars for the desktop pet. Different accessories will bring different experiences. For example, wearing a "bow" may make it more coquettish and its voice tone softer; wearing a "bachelor's cap" may make it more keen on asking questions and popularizing scientific knowledge. More importantly, these accessories will subtly affect the "personality parameters" of the desktop pet - for example, the "bow" will slowly increase its "intimacy", making it easier to unlock intimate interactions; the "star" will improve its "intelligence", making it more capable of answering questions.
3. Immersive realistic interaction experience: When the desktop pet expresses emotions, it is not a single voice or expression, but the synergy of voice, visual animation and touch. When it is "angry", a jumping flame animation will be displayed on the screen, its voice tone will become rapid and dissatisfied, and at the same time, the body will emit a slight "vibration" sound effect (simulated by a micro vibration motor), as if shaking with anger; when it is "happy", its eyes will turn into curved crescents, its body will swing gently, and it will make pleasant humming sounds. This multi-sensory synergy strengthens the realism and credibility of interaction, making it easier for users to empathize emotionally, thereby establishing a deeper emotional trust.
Q3: From the user's perspective, what (novel) emotional value can the desktop pet provide? Can this "emotional interactive desktop pet" become the mainstream form of companion AI products in the future?
We observe that users' demand for AI products is shifting from "function satisfaction" to "emotional resonance". The AI desktop pet provides two types of novel emotional values: real-time emotional feedback and stress-free companionship. When the user is happy, the desktop pet will "jump for joy"; when the user is depressed, it will comfort with cute expressions. This specific response is warmer than mobile phone push notifications; at the same time, unlike interpersonal communication that needs to take care of the other party's emotions, users can start or end the interaction with the desktop pet at any time, forming a "zero-burden sense of intimacy".
Regarding the judgment of the future form, we believe that emotional interactive desktop pets will become an important branch of companion AI products. Firstly, the application market of technologies such as multimodal perception, lightweight silent motors and emotional computing is becoming increasingly mature, which makes the production cost of emotional interactive desktop pets have room to decrease, allowing more users to access them; secondly, from children to adults, from desks to bedside tables, their small size can penetrate into more life scenarios, greatly improving the usage rate of AI desktop pets.