02-HarmonyOS5-SpeechRecognizer-Case

1 zhousg 0 6/11/2025, 9:06:42 AM

Case Description This is a real-time speech-to-text case implemented based on AI basic voice services. It collects audio through a microphone and converts it into text in real-time.

import { speechRecognizer } from '@kit.CoreSpeechKit' import { abilityAccessCtrl } from '@kit.AbilityKit' import { promptAction } from '@kit.ArkUI'

@Entry @ComponentV2 struct SpeechRecognizer { @Local isRecording: boolean = false @Local text: string = '' hasPermissions: boolean = false asrEngine?: speechRecognizer.SpeechRecognitionEngine

  aboutToAppear(): void {
    // Request microphone permissions
    this.requestPermissions()
  }

  async requestPermissions() {
    const atManager = abilityAccessCtrl.createAtManager();
    const res = await atManager.requestPermissionsFromUser(getContext(), ['ohos.permission.MICROPHONE'])
    this.hasPermissions =
      res.authResults.every(grantStatus => grantStatus === abilityAccessCtrl.GrantStatus.PERMISSION_GRANTED)
  }

  // Start microphone recognition
  async startRecord() {
    if (canIUse('SystemCapability.AI.SpeechRecognizer')) {
      if (!this.hasPermissions) {
        return promptAction.showToast({ message: 'Microphone not authorized' })
      }
      if (this.isRecording) {
        return promptAction.showToast({ message: 'Recording...' })
      }
      this.isRecording = true
      this.asrEngine = await speechRecognizer.createEngine({
        language: 'zh-CN',
        online: 1
      })
      const _this = this
      this.asrEngine.setListener({
        onStart(sessionId: string, eventMessage: string) {
        },
        onEvent(sessionId: string, eventCode: number, eventMessage: string) {
        },
        onResult(sessionId: string, result: speechRecognizer.SpeechRecognitionResult) {
          _this.text = result.result
          if (result.isLast) {
            _this.isRecording = false
          }
        },
        onComplete(sessionId: string, eventMessage: string) {
        },
        onError(sessionId: string, errorCode: number, errorMessage: string) {
        }
      })
      const audioParam: speechRecognizer.AudioInfo = {
        audioType: 'pcm',
        sampleRate: 16000,
        soundChannel: 1,
        sampleBit: 16
      }
      const extraParam: Record<string, Object> = {
        "recognitionMode": 0,
        "vadBegin": 2000,
        "vadEnd": 3000,
        "maxAudioDuration": 20000
      }
      const recognizerParams: speechRecognizer.StartParams = {
        sessionId: '10000',
        audioInfo: audioParam,
        extraParams: extraParam
      }
      this.asrEngine.startListening(recognizerParams)
    }
  }

  async closeRecord() {
    if (canIUse('SystemCapability.AI.SpeechRecognizer')) {
      this.asrEngine?.finish('10000')
      this.asrEngine?.cancel('10000')
      this.asrEngine?.shutdown()
    }
  }

  build() {
    Column() {
      Row() {
        Text(this.text)
          .width('100%')
          .lineHeight(32)
      }
      .alignItems(VerticalAlign.Top)
      .width('100%')
      .layoutWeight(1)

      Button(this.isRecording ? 'Start Speaking' : 'Press and Speak')
        .width('100%')
        .gesture(LongPressGesture()
          .onAction(() => {
            this.startRecord()
          })
          .onActionEnd(() => {
            this.closeRecord()
          })
          .onActionCancel(() => {
            this.closeRecord()
          }))

    }
    .padding(15)
    .height('100%')
    .width('100%')
  }
}

Tell HN: Help restore the tax deduction for software dev in the US (Section 174)

Bill Atkinson has died (daringfireball.net)

GCP Outage (status.cloud.google.com)

A receipt printer cured my procrastination (laurieherault.com)

Frequent reauth doesn't make you more secure (tailscale.com)

The last six months in LLMs, illustrated by pelicans on bicycles (simonwillison.net)

Magistral — the first reasoning model by Mistral AI (mistral.ai)

Apple announces Foundation Models and Containerization frameworks, etc (apple.com)

Containerization is a Swift package for running Linux containers on macOS (github.com)

Apple introduces a universal design across platforms (apple.com)

Research suggests Big Bang may have taken place inside a black hole (port.ac.uk)

Marines being mobilized in response to LA protests (cnn.com)

US-backed Israeli company's spyware used to target European journalists (apnews.com)

Chatterbox TTS (github.com)

Jemalloc Postmortem (jasone.github.io)

Bruteforcing the phone number of any Google user (brutecat.com)

How we decreased GitLab repo backup times from 48 hours to 41 minutes (about.gitlab.com)

Congratulations on creating the one billionth repository on GitHub (github.com)

Launch HN: Vassar Robotics (YC X25) – $219 robot arm that learns new skills

"Localhost tracking" explained. It could cost Meta €32B (zeropartydata.es)

Kagi Reaches 50k Users (kagi.com)

Meta: Shut down your invasive AI Discover feed (mozillafoundation.org)

How I Program with Agents (crawshaw.io)

OpenAI dropped the price of o3 by 80% (twitter.com)

Self-Host and Tech Independence: The Joy of Building Your Own (ssp.sh)

My experiment living in a tent in Hong Kong's jungle (corentin.trebaol.com)

Air India flight to London crashes in Ahmedabad with more than 240 onboard (theguardian.com)

The librarian immediately attempts to sell you a vuvuzela (kaveland.no)

The Illusion of Thinking: Strengths and limitations of reasoning models [pdf] (ml-site.cdn-apple.com)

Washington Post's Privacy Tip: Stop Using Chrome, Delete Meta Apps (and Yandex) (tech.slashdot.org)

Falsehoods programmers believe about aviation (flightaware.engineering)

Joining Apple Computer (2018) (folklore.org)

Building supercomputers for autocrats probably isn't good for democracy (helentoner.substack.com)

Researchers develop ‘transparent paper’ as alternative to plastics (japannews.yomiuri.co.jp)

We’re secretly winning the war on cancer (vox.com)

Getting Past Procrastination (spectrum.ieee.org)

Convert photos to Atkinson dithering (gazs.github.io)

If the moon were only 1 pixel: A tediously accurate solar system model (2014) (joshworth.com)

Danish Ministry Replaces Windows and Microsoft Office with Linux and LibreOffice (heise.de)

Show HN: I made a 3D printed VTOL drone (tsungxu.com)

FSE meets the FBI (blog.freespeechextremist.com)

Low-background Steel: content without AI contamination (blog.jgc.org)

Show HN: Chili3d – A open-source, browser-based 3D CAD application

Successful people set constraints rather than chasing goals (joanwestenberg.com)

Brian Wilson has died (pitchfork.com)

Finding Shawn Mendes (2019) (ericneyman.wordpress.com)

A year of funded FreeBSD development (daemonology.net)

Sly Stone has died (abcnews.go.com)

Show HN: Spark, An advanced 3D Gaussian Splatting renderer for Three.js (sparkjs.dev)

macOS Tahoe brings a new disk image format (eclecticlight.co)

02-HarmonyOS5-SpeechRecognizer-Case

Comments (0)