GB/Z 177.9-2026 Intelligence grading of artificial intelligence terminal—Part 9: Earphone English, Anglais, Englisch, Inglés, えいご
This is a draft translation for reference among interesting stakeholders. The finalized translation (passing through draft translation, self-check, revision and verification) will be delivered upon being ordered.
ICS
CCS
National Standard of the People's Republic of China
GB/Z 177.9-2026
Intelligence grading of artificial intelligence terminal - Part 9: Earphone
人工智能终端智能化分级 第9部分:耳机
Issue date: 2026-03-31 Implementation date: 2026-10-01
Issued by the General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China
the Standardization Administration of the People's Republic of China
Contents
Foreword
Introduction
1 Scope
2 Normative References
3 Terms and Definitions
4 Abbreviations
5 Key Capabilities
5.1 Overview
5.2 L1 Response Level
5.3 L2 Tool Level
5.4 L3 Assistance Level
6 Level Determination
Annex A (Normative) Test Methods
A.1 Test Environment
A.2 L1 Response Level
A.3 L2 Tool Level
A.4 L3 Assistance Level
Annex B (Informative) Typical Application Scenarios
Bibliography
Artificial intelligence terminal intelligence classification — Part 9: Headphones
1 Scope
This document specifies the classification levels and level determination of key intelligence capabilities for headphones, and provides test methods.
This document is intended to guide the intelligence classification of headphones, including common forms such as overear, inear, semiinear and openear headphones, and also provides a reference for the design, development, application, selection and testing of artificial intelligence headphones.
2 Normative References
The following documents are essential for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition (including any amendments) applies.
GB/T 45288.2-2025 Artificial intelligence — Large models — Part 2: Evaluation metrics and methods
GB/Z 177.1-2026 Artificial intelligence terminal intelligence classification — Part 1: Reference framework
GB/Z 177.2-2026 Artificial intelligence terminal intelligence classification — Part 2: General requirements
3 Terms and Definitions
For the purposes of this document, the terms and definitions given in GB/Z 177.1-2026, GB/Z 177.2-2026 and the following apply.
3.1 sound pickup
The process of collecting sound signals through a microphone.
3.2 speech recognition
The process of converting human voice signals into text or commands.
[Source: GB/T 21023-2007, 3.1]
3.3 active noise cancellation
A technology that reduces ambient noise interference by analysing noise characteristics in real time, generating an antiphase sound wave and suppressing noise components.
3.4 environmental noise cancellation
A technology that improves speech transmission clarity in call scenarios by separating speech from ambient noise, identifying and suppressing nontarget noise components.
NOTE: Environmental noise cancellation is also referred to as call noise cancellation.
3.5 wakeup word
A word or phrase used by a user to wake up a device and initiate speech interaction.
3.6 host device
A device that can establish a connection with headphones, provide an audio signal source and control command interaction for the headphones, initiate connection requests, configure headphone parameters, and control related functions.
4 Abbreviations
The following abbreviation applies to this document in addition to those defined in GB/Z 177.1 and GB/Z 177.2.
MOS: Mean Opinion Score
5 Key Capabilities
5.1 Overview
The capability elements in this document are based on GB/Z 177.1 and GB/Z 177.2. In accordance with the product characteristics of headphones, learning capability is not included, and only endside capabilities are required. The intelligence level of headphones is divided into L1, L2 and L3 levels.
5.2 L1 Response Level
5.2.1 Perception
5.2.1.1 User information perception
User information perception capabilities shall meet the following requirements:
a) Voice information: the ability to perceive user voice input information, with the following specific requirements:
Speech recognition accuracy in a quiet environment shall not be less than 90 %;
Speech recognition accuracy in a noisy environment shall not be less than 80 %.
b) Touch information: the ability to perceive user touch input information, such as tapping and sliding. The accuracy of touch operations shall not be less than 90 %.
5.2.1.2 Device information perception
Device information perception capabilities shall meet the following requirements:
a) Software and hardware status: the ability to perceive its own software and hardware status, such as battery level, charging status, connection status and system version.
b) Task status: the ability to perceive the currently executing task and related parameters, such as music playback and phone calls.
5.2.1.3 Environmental information perception
Network information: the ability to perceive environmental information via the Internet, such as weather and geographical location.
5.2.2 Cognition
5.2.2.1 Understanding
Single simple instruction: the ability to understand a single simple voice instruction from the user, with a response time not exceeding 1.5 s.
5.2.2.2 Reasoning
No requirements.
5.2.2.3 Planning
No requirements.
5.2.3 Execution
5.2.3.1 Tool invocation
Singlestep tool invocation: the ability to invoke basic tool functions based on basic protocols, such as adjusting volume, answering, hanging up, playing, pausing and opening an App.
5.2.3.2 Content generation
No requirements.
5.2.3.3 Interconnection and collaboration
No requirements.
5.2.3.4 Expression output
Clear voice output: the ability to output clear voice without extraneous sounds, impact sounds or abnormal sounds that affect normal use.
5.2.4 Memory
5.2.4.1 Shortterm memory
No requirements.
5.2.4.2 Longterm memory
No requirements.
5.3 L2 Tool Level
5.3.1 Perception
5.3.1.1 User information perception
User information perception capabilities shall meet the following requirements:
a) Voice information: the ability to perform voice wakeup and to perceive user voice input information, with the following specific requirements:
Voice wakeup accuracy in a quiet environment shall not be less than 95 %;
Voice wakeup accuracy in a noisy environment shall not be less than 90 %;
Speech recognition accuracy in a quiet environment shall not be less than 90 %;
Speech recognition accuracy in a noisy environment shall not be less than 80 %.
b) Touch information: the ability to perceive user touch input information, such as tapping and sliding. The accuracy of touch operations shall not be less than 90 %.
GB/Z 177.9-2026 Intelligence grading of artificial intelligence terminal—Part 9: Earphone English, Anglais, Englisch, Inglés, えいご
This is a draft translation for reference among interesting stakeholders. The finalized translation (passing through draft translation, self-check, revision and verification) will be delivered upon being ordered.
ICS
CCS
National Standard of the People's Republic of China
GB/Z 177.9-2026
Intelligence grading of artificial intelligence terminal - Part 9: Earphone
人工智能终端智能化分级 第9部分:耳机
Issue date: 2026-03-31 Implementation date: 2026-10-01
Issued by the General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China
the Standardization Administration of the People's Republic of China
Contents
Foreword
Introduction
1 Scope
2 Normative References
3 Terms and Definitions
4 Abbreviations
5 Key Capabilities
5.1 Overview
5.2 L1 Response Level
5.3 L2 Tool Level
5.4 L3 Assistance Level
6 Level Determination
Annex A (Normative) Test Methods
A.1 Test Environment
A.2 L1 Response Level
A.3 L2 Tool Level
A.4 L3 Assistance Level
Annex B (Informative) Typical Application Scenarios
Bibliography
Artificial intelligence terminal intelligence classification — Part 9: Headphones
1 Scope
This document specifies the classification levels and level determination of key intelligence capabilities for headphones, and provides test methods.
This document is intended to guide the intelligence classification of headphones, including common forms such as overear, inear, semiinear and openear headphones, and also provides a reference for the design, development, application, selection and testing of artificial intelligence headphones.
2 Normative References
The following documents are essential for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition (including any amendments) applies.
GB/T 45288.2-2025 Artificial intelligence — Large models — Part 2: Evaluation metrics and methods
GB/Z 177.1-2026 Artificial intelligence terminal intelligence classification — Part 1: Reference framework
GB/Z 177.2-2026 Artificial intelligence terminal intelligence classification — Part 2: General requirements
3 Terms and Definitions
For the purposes of this document, the terms and definitions given in GB/Z 177.1-2026, GB/Z 177.2-2026 and the following apply.
3.1 sound pickup
The process of collecting sound signals through a microphone.
3.2 speech recognition
The process of converting human voice signals into text or commands.
[Source: GB/T 21023-2007, 3.1]
3.3 active noise cancellation
A technology that reduces ambient noise interference by analysing noise characteristics in real time, generating an antiphase sound wave and suppressing noise components.
3.4 environmental noise cancellation
A technology that improves speech transmission clarity in call scenarios by separating speech from ambient noise, identifying and suppressing nontarget noise components.
NOTE: Environmental noise cancellation is also referred to as call noise cancellation.
3.5 wakeup word
A word or phrase used by a user to wake up a device and initiate speech interaction.
3.6 host device
A device that can establish a connection with headphones, provide an audio signal source and control command interaction for the headphones, initiate connection requests, configure headphone parameters, and control related functions.
4 Abbreviations
The following abbreviation applies to this document in addition to those defined in GB/Z 177.1 and GB/Z 177.2.
MOS: Mean Opinion Score
5 Key Capabilities
5.1 Overview
The capability elements in this document are based on GB/Z 177.1 and GB/Z 177.2. In accordance with the product characteristics of headphones, learning capability is not included, and only endside capabilities are required. The intelligence level of headphones is divided into L1, L2 and L3 levels.
5.2 L1 Response Level
5.2.1 Perception
5.2.1.1 User information perception
User information perception capabilities shall meet the following requirements:
a) Voice information: the ability to perceive user voice input information, with the following specific requirements:
Speech recognition accuracy in a quiet environment shall not be less than 90 %;
Speech recognition accuracy in a noisy environment shall not be less than 80 %.
b) Touch information: the ability to perceive user touch input information, such as tapping and sliding. The accuracy of touch operations shall not be less than 90 %.
5.2.1.2 Device information perception
Device information perception capabilities shall meet the following requirements:
a) Software and hardware status: the ability to perceive its own software and hardware status, such as battery level, charging status, connection status and system version.
b) Task status: the ability to perceive the currently executing task and related parameters, such as music playback and phone calls.
5.2.1.3 Environmental information perception
Network information: the ability to perceive environmental information via the Internet, such as weather and geographical location.
5.2.2 Cognition
5.2.2.1 Understanding
Single simple instruction: the ability to understand a single simple voice instruction from the user, with a response time not exceeding 1.5 s.
5.2.2.2 Reasoning
No requirements.
5.2.2.3 Planning
No requirements.
5.2.3 Execution
5.2.3.1 Tool invocation
Singlestep tool invocation: the ability to invoke basic tool functions based on basic protocols, such as adjusting volume, answering, hanging up, playing, pausing and opening an App.
5.2.3.2 Content generation
No requirements.
5.2.3.3 Interconnection and collaboration
No requirements.
5.2.3.4 Expression output
Clear voice output: the ability to output clear voice without extraneous sounds, impact sounds or abnormal sounds that affect normal use.
5.2.4 Memory
5.2.4.1 Shortterm memory
No requirements.
5.2.4.2 Longterm memory
No requirements.
5.3 L2 Tool Level
5.3.1 Perception
5.3.1.1 User information perception
User information perception capabilities shall meet the following requirements:
a) Voice information: the ability to perform voice wakeup and to perceive user voice input information, with the following specific requirements:
Voice wakeup accuracy in a quiet environment shall not be less than 95 %;
Voice wakeup accuracy in a noisy environment shall not be less than 90 %;
Speech recognition accuracy in a quiet environment shall not be less than 90 %;
Speech recognition accuracy in a noisy environment shall not be less than 80 %.
b) Touch information: the ability to perceive user touch input information, such as tapping and sliding. The accuracy of touch operations shall not be less than 90 %.