Harish Mallidi

Harish Mallidi

Machine Learning Scientist

Alexa Science

Amazon, Seattle

email: last_name@jhu.edu

www: http://mallidi.github.io

Interests

Automatic speech recognition, Speech signal processing, Language identification, Speaker recognition, Machine learning.

Education

June 2004 - August 2010, Bachelor of Technology + Master of Science (Electronics and Communication Engineering), https://www.iiit.ac.in/, India

September 2011 - November 2016, Doctor of Philosophy, The Center for Language and Speech Processing (CLSP), Johns Hopkins University (JHU), USA, supervised by Prof. Hynek Hermansky
- Thesis Proposal - Performance Monitor Techniques for Noise Robust Speech Recognition, December 2014.
- Thesis - A Practical and Efficient Multistream Framework for Noise Robust Speech Recognition, February 2018.

Experience

NIST Language Recognition Evaluation 2015, Nov 2015

Participated as part of Brno University of Technology team in NIST LRE15 evals.

Application of DNNs to classify languages directly, instead of conventional usage as bottleneck feature extractor.

IARPA ASpIRE Challenge 2015, Feb`15

Participated as part of BBN team. (press release)

Noise robust ASR based on FDLP based modulation features.

Fred Jelinek Memorial Workshop, Prague, July`14 - Aug.`14

Part of the team focusing on Unsupervised Confidence Estimation of Neural Networks

Application: Noise Robust Speech Recognition

Contribution to KALDI speech recognition toolkit, Dec.`13 - present

Involved in implementation of 2D Convolutional Neural Networks (CNNs) in KALDI.

Research Intern at BBN Technologies May `13 - Aug.`13

Involved in the implementation of Neural Network based features in BBN's Byblos speech recognition toolkit.

The features have been successfully used in 2014 DARPA RATS and IARPA BABEL evaluations.

Publications

In Journals

AM Castro Martinez, Sri Harish Mallidi, and B. T. Meyer, ``On the relevance of auditory-based Gabor features for deep learning in robust speech recognition'', Computer Speech & Language, Sept. 2017 (pdf)

S. Ganapathy, Sri Harish Mallidi, and H. Hermansky, ``Robust Feature Extraction Using Modulation Filtering of Autoregressive Models'', IEEE Transactions on Audio, Speech and Language Proc. June 2014 (pdf)

S. Garimella, Sri Harish Mallidi, and H. Hermansky, ``Regularized Auto-Associative Neural Networks for Speaker Verification'', IEEE Signal Processing Letters. Dec. 2012 (pdf)

In Conferences

Sri Harish Mallidi, R. Maas, K. Goehner, A. Rastrow, S. Matsoukas, B. Hoffmeister ``Device-directed utterance detection'', Interspeech 2018. (pdf) (arXiv)

Torres-Carrasquillo et. al. ``The MIT-LL, JHU and LRDE NIST 2016 speaker recognition evaluation system'', Interspeech 2017. (pdf)

B. Meyer, Sri Harish Mallidi, H. Kayser, H. Hermansky ``Predicting error rates for unknown data in automatic speech recognition'', ICASSP 2017. (pdf)

B. T. Meyer, Sri Harish Mallidi, H. Kayser, H. Hermansky ``Performance monitoring for automatic speech recognition in noisy multi-channel environments'', IEEE SLT 2016. (pdf)

T. Ogawa, Sri Harish Mallidi, E. Dupoux, J. Cohen, N. Feldman, H. Hermansky ``A new efficient measure for accuracy prediction and its application to multistream-based unsupervised adaptation'', ICPR 2016. (pdf)

Sri Harish Mallidi, H Hermansky ``A framework for Practical Multistream ASR'', Interspeech 2016. (pdf)

Ruizhi Li, Sri Harish Mallidi, Lukas Burget, Oldich Plchot, Najim Dehak ``Exploiting Hidden-Layer Responses of Deep Neural Networks for Language Recognition'', Interspeech 2016. (pdf)

Oldrich Plchot et. al.``BAT System Description for NIST LRE 2015'', Accepted in ISCA Speaker Odyssey, 2016.

Sri Harish Mallidi, H Hermansky ``Novel Neural Network Based Fusion for Multistream ASR'', ICASSP 2016. (pdf) (poster)

Sri Harish Mallidi, T Ogawa, H Hermansky ``Uncertainty Estimation of DNN classifiers'', IEEE ASRU 2015. (pdf) (poster)

Sri Harish Mallidi, T Ogawa, K Vesely, P S Nidadavolu, H Hermansky ``Autoencoder based multi-stream combination for noise robust speech recognition'', Interspeech 2015. (pdf)

Hynek Hermansky, et. al. ``Towards machines that know when they do not know: Summary of work done at 2014 Frederick Jelinek Memorial Workshop'', ICASSP 2015. (pdf)

Tim Ng, et. al, ``Progress in the BBN Keyword Search System for the DARPA RATS Program'', Interspeech 2014. (pdf)

P Matejka, L Zhang, T Ng, Sri Harish Mallidi, et. al. ``Neural Network Bottleneck Features for Language Identification'', ISCA Speaker Odyssey 2014. (pdf)

Sri Harish Mallidi, Sriram Ganapathy and Hynek Hermansky, ``Robust Speaker Recognition Using Spectro-Temporal Autoregressive Models'', Interspeech 2013. (pdf)

Jeff Ma, Bing Zhang, Spyros Matsoukas, Sri Harish Mallidi, Feipeng Li, Hynek Hermansky,``Improvements in Language Identification on the RATS Noisy Speech Corpus'', Interspeech 2013. (pdf)

Pascal Clark, Sri Harish Mallidi, Aren Jansen, and Hynek Hermansky, ``Frequency Offset Correction in Speech without Detecting Pitch'', ICASSP 2013. (pdf)

Oldrich Plchot, et.al, ``Developing a Speaker Identification System for the DARPA RATS Project'', ICASSP 2013. (pdf)

Samuel Thomas, Sri Harish Mallidi, Thomas Janu, Hynek Hermansky, Nima Mesgarani, Xinhui Zhou, Shihab Shamma, Tim Ng, Bing Zhang, Long Nguyen and Spyros Matsoukas, ``Acoustic and Data-driven Features for Robust Speech Activity Detection'', Interspeech 2012. (pdf)

Feipeng Li, Sri Harish Mallidi and H. Hermansky, ``Phone recognition in critical bands using sub-band temporal modulations'', Interspeech, 2012. (pdf) (poster)

Samuel Thomas, Sri Harish Mallidi, Sriram Ganapathy and Hynek Hermansky, ``Adaptation Transforms of Auto-Associative Neural Networks as Features for Speaker Verification'', Odyssey Speaker and Language Recognition Workshop, Singapore, June 2012 (pdf)

Daniel Garcia-Romero, Xinhui Zhou et al. The UMD-JHU 2011 Speaker Recognition System, ICASSP, Kyoto, Japan, March 2012 (pdf)

Sri Harish Mallidi, Sriram Ganapathy and Hynek Hermansky, ``Modulation spectrum analysis for recognition of reverberant speech'', Interspeech, 2011. (pdf)

Sri Harish Mallidi, Kishore S. Prahallad, Suryakanth V Gangashetty and B. Yegnanarayana, ``Significance of Pitch Synchronous Analysis for Speaker Recognition using AANN Models'', Interspeech, 2010. (pdf)

Anand Joseph M., Sri Harish Mallidi and B. Yegnanarayana, ``Speaker Dependent Mapping of Source and System features for Enhancement of Throat Microphone Speech'', Interspeech, 2010. (pdf)

Sudheer Kovela, Sri Harish Mallidi, Sri Rama Murty K and B. Yegnanarayana, ``Analysis of laugh signals for detecting in continuous speech'', Interspeech, 2009. (pdf)

Teaching

Teaching Assistant for Processing of Audio and Visual Signals, Fall-2012, 2013, Spring-2016

Teaching Assistant for Speech and Auditory Processing by Humans and Machines, Spring-2013, 2013

Copyright Notice - This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.