| Peer-Reviewed

Embedded System for Speech Recognition and Image Processing

Received: 16 December 2014     Accepted: 23 December 2014     Published: 6 February 2015
Views:       Downloads:
Abstract

In recent years, the products of voice terminal and image retrieval show the intelligentized trend, but the mature commodities are rare in the market. This paper presents an embedded design method of intelligent voice terminal based on pattern recognition. The design adopts Samsung S3C2410 ARM as target board, Philips Uda1341TS as audio codec, embedded Linux OS as software platform, and speech recognition is implemented through small-vocabulary voice training. To improve the recognized effect, we use the image retrieval technology as an auxiliary tool, which helps speech recognition module create or more accurately find a personal voice-training library. By means of image recognition, the experimental results prove that the effect of speech recognition achieves an average increase of 10 percentages.

Published in Journal of Electrical and Electronic Engineering (Volume 2, Issue 6)
DOI 10.11648/j.jeee.20140206.12
Page(s) 89-93
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2015. Published by Science Publishing Group

Keywords

Speech Recognition, Embedded Development, Image Retrieval, DTW Algorithm, ARM Development

References
[1] Shen Y T. Portable personal multimedia terminal: U.S. Patent D689, 856[P]. 2013-9-17.
[2] Rasiwasia N, Costa Pereira J, Coviello E, et al. A new approach to cross-modal multimedia retrieval[C]//Proceedings of the international conference on Multimedia. ACM, 2010: 251-260.
[3] Rabiner L R, Schafer R W. Digital Speech Processing [J]. The Froehlich/Kent Encyclopedia of Telecommunications, 2011, 6: 237-258.
[4] Hinton G, Deng L, Yu D, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups [J]. Signal Processing Magazine, IEEE, 2012, 29(6): 82-97.
[5] Muscillo R, Schmid M, Conforto S, et al. Early recognition of upper limb motor tasks through accelerometers: real-time implementation of a DTW-based algorithm [J]. Computers in biology and medicine, 2011, 41(3): 164-172.
[6] Zhu B B, Yan J, Li Q, et al. Attacks and design of image recognition CAPTCHAs[C]//Proceedings of the 17th ACM conference on Computer and communications security. ACM, 2010: 187-200.
[7] Lux M, Klieber W, Granitzer M. Caliph & Emir: semantics in multimedia retrieval and annotation[C]//Proceedings of the 19th International CODATA Conference. 2004: 64-75.
[8] Viswanathan M, Viswanathan M. Measuring speech quality for text-to-speech systems: development and assessment of a modified mean opinion score (MOS) scale [J]. Computer Speech & Language, 2005, 19(1): 55-83.
Cite This Article
  • APA Style

    Zhengxi Wei, Jinming Liang. (2015). Embedded System for Speech Recognition and Image Processing. Journal of Electrical and Electronic Engineering, 2(6), 89-93. https://doi.org/10.11648/j.jeee.20140206.12

    Copy | Download

    ACS Style

    Zhengxi Wei; Jinming Liang. Embedded System for Speech Recognition and Image Processing. J. Electr. Electron. Eng. 2015, 2(6), 89-93. doi: 10.11648/j.jeee.20140206.12

    Copy | Download

    AMA Style

    Zhengxi Wei, Jinming Liang. Embedded System for Speech Recognition and Image Processing. J Electr Electron Eng. 2015;2(6):89-93. doi: 10.11648/j.jeee.20140206.12

    Copy | Download

  • @article{10.11648/j.jeee.20140206.12,
      author = {Zhengxi Wei and Jinming Liang},
      title = {Embedded System for Speech Recognition and Image Processing},
      journal = {Journal of Electrical and Electronic Engineering},
      volume = {2},
      number = {6},
      pages = {89-93},
      doi = {10.11648/j.jeee.20140206.12},
      url = {https://doi.org/10.11648/j.jeee.20140206.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.jeee.20140206.12},
      abstract = {In recent years, the products of voice terminal and image retrieval show the intelligentized trend, but the mature commodities are rare in the market. This paper presents an embedded design method of intelligent voice terminal based on pattern recognition. The design adopts Samsung S3C2410 ARM as target board, Philips Uda1341TS as audio codec, embedded Linux OS as software platform, and speech recognition is implemented through small-vocabulary voice training. To improve the recognized effect, we use the image retrieval technology as an auxiliary tool, which helps speech recognition module create or more accurately find a personal voice-training library. By means of image recognition, the experimental results prove that the effect of speech recognition achieves an average increase of 10 percentages.},
     year = {2015}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Embedded System for Speech Recognition and Image Processing
    AU  - Zhengxi Wei
    AU  - Jinming Liang
    Y1  - 2015/02/06
    PY  - 2015
    N1  - https://doi.org/10.11648/j.jeee.20140206.12
    DO  - 10.11648/j.jeee.20140206.12
    T2  - Journal of Electrical and Electronic Engineering
    JF  - Journal of Electrical and Electronic Engineering
    JO  - Journal of Electrical and Electronic Engineering
    SP  - 89
    EP  - 93
    PB  - Science Publishing Group
    SN  - 2329-1605
    UR  - https://doi.org/10.11648/j.jeee.20140206.12
    AB  - In recent years, the products of voice terminal and image retrieval show the intelligentized trend, but the mature commodities are rare in the market. This paper presents an embedded design method of intelligent voice terminal based on pattern recognition. The design adopts Samsung S3C2410 ARM as target board, Philips Uda1341TS as audio codec, embedded Linux OS as software platform, and speech recognition is implemented through small-vocabulary voice training. To improve the recognized effect, we use the image retrieval technology as an auxiliary tool, which helps speech recognition module create or more accurately find a personal voice-training library. By means of image recognition, the experimental results prove that the effect of speech recognition achieves an average increase of 10 percentages.
    VL  - 2
    IS  - 6
    ER  - 

    Copy | Download

Author Information
  • School of Computer Science, Sichuan University of Science & Engineering, Zigong Sichuan 643000, PR China

  • School of Computer Science, Sichuan University of Science & Engineering, Zigong Sichuan 643000, PR China

  • Sections