Works

技術に関連するさまざまな成果、実績を紹介します。

タイトル 技術カテゴリ 氏名 リンク

[学会名]International Conference on Nitride Semiconductor (ICNS)
[論文タイトル]Watt-class 462 nm-Blue and 530 nm-Green Laser Diodes

In this research, watt‐class green and blue laser diodes, which are fabricated on free‐standing semipolar urn:x-wiley:14381656:media:pssa201700513:pssa201700513-math-0003 GaN and conventional c‐plane GaN substrates, respectively are developed. Although several research groups have recently developed green laser diodes on semipolar GaN substrates, which have weaker piezoelectric fields and higher indium homogeneity in InGaN active regions compared to c‐plane GaN, watt‐level output power has yet to be achieved. By utilizing the urn:x-wiley:14381656:media:pssa201700513:pssa201700513-math-0004 plane, the first watt‐class green lasers at 530 nm is successfully fabricated, and achieve maximum output powers in excess of 2 W, which to the best of our knowledge is the highest value reported for any GaN‐based green laser diode. A wall‐plug efficiency of 17.5% is realized at a current of 1.2 A under continuous‐wave operation, which corresponds to an optical output of approximately 1 W and is the highest value reported to date. In addition, high‐power and high‐efficiency blue laser diodes at 465 nm are successfully fabricated on conventional c‐plane GaN substrates. The output power and wall‐plug efficiency are 5.2 W and 37.0%, respectively, at a current of 3.0 A under continuous‐wave operation. These laser diodes are promising light sources meeting the ITU‐R Recommendation BT.2020 for future laser display applications.

詳細を見る
技術カテゴリ Display & Visual
氏名 M. Murayama,
Y. Nakayama,
K. Yamazaki,
Y. Hoshina,
H. Watanabe(Sony Corporation)

In this research, watt‐class green and blue laser diodes, which are fabricated on free‐standing semipolar urn:x-wiley:14381656:media:pssa201700513:pssa201700513-math-0003 GaN and conventional c‐plane GaN substrates, respectively are developed. Although several research groups have recently developed green laser diodes on semipolar GaN substrates, which have weaker piezoelectric fields and higher indium homogeneity in InGaN active regions compared to c‐plane GaN, watt‐level output power has yet to be achieved. By utilizing the urn:x-wiley:14381656:media:pssa201700513:pssa201700513-math-0004 plane, the first watt‐class green lasers at 530 nm is successfully fabricated, and achieve maximum output powers in excess of 2 W, which to the best of our knowledge is the highest value reported for any GaN‐based green laser diode. A wall‐plug efficiency of 17.5% is realized at a current of 1.2 A under continuous‐wave operation, which corresponds to an optical output of approximately 1 W and is the highest value reported to date. In addition, high‐power and high‐efficiency blue laser diodes at 465 nm are successfully fabricated on conventional c‐plane GaN substrates. The output power and wall‐plug efficiency are 5.2 W and 37.0%, respectively, at a current of 3.0 A under continuous‐wave operation. These laser diodes are promising light sources meeting the ITU‐R Recommendation BT.2020 for future laser display applications.

[学会名]The Society for Information Display(SID)
[論文タイトル]New Pixel Driving Circuit Using Self-discharging Compensation Method for High-Resolution OLED Micro Displays on a Silicon Backplane

A new 4T2C pixel circuit formed on a silicon substrate is proposed to realize a high‐resolution 7.8‐μm pixel pitch AMOLED microdisplay. In order to achieve high luminance uniformity, the pixel circuit compensates its Vth variation of the MOSFET for the driving transistor internally by using self‐discharging method. Also presented are 0.5‐in Quad‐VGA and 1.25‐in wide Quad‐XGA microdisplays with the proposed pixel circuit.

詳細を見る
技術カテゴリ Display & Visual
氏名 K. Kimura,
Y. Onoyama,
T. Tanaka(Sony Corporation),
N. Toyomura,
H. Kitagawa(Sony Semiconductor Solutions Corporation)

A new 4T2C pixel circuit formed on a silicon substrate is proposed to realize a high‐resolution 7.8‐μm pixel pitch AMOLED microdisplay. In order to achieve high luminance uniformity, the pixel circuit compensates its Vth variation of the MOSFET for the driving transistor internally by using self‐discharging method. Also presented are 0.5‐in Quad‐VGA and 1.25‐in wide Quad‐XGA microdisplays with the proposed pixel circuit.

[学会名]The Society for Information Display(SID)
[論文タイトル]High light extraction efficiency laser-phosphor light source

We investigated the laser‐phosphor light source by using inorganic phosphor wheel. We experimentally confirmed the light extraction efficiency of the inorganic phosphor wheel which is 8% higher than conventional phosphor wheel. In addition, we explain about the cause of improvement of the efficiency by showing fluorescence emission model.

詳細を見る
技術カテゴリ Display & Visual
氏名 H.Morita,
Y. Maeda,
I.Kobayashi,
Y. Sato,
T. Nomura,
H.Kikuchi(Sony Corporation)

We investigated the laser‐phosphor light source by using inorganic phosphor wheel. We experimentally confirmed the light extraction efficiency of the inorganic phosphor wheel which is 8% higher than conventional phosphor wheel. In addition, we explain about the cause of improvement of the efficiency by showing fluorescence emission model.

[学会名]Biomedical Engineering Systems and Technologies(BIOSTEC)
[論文タイトル]Wearable Motion Tolerant PPG Sensor for Instant Heart Rate in Daily Activity

A wristband-type PPG heart rate sensor capable of overcoming motion artifacts in daily activity and detecting heart rate variability has been developed together with a motion artifact cancellation framework. In this work, a motion artifact model in daily life was derived and motion artifacts caused by activity of arm, finger, and wrist were cancelled significantly. Highly reliable instant heart rate detection with high noise-resistance was achieved from noise-reduced pulse signals based on peak-detection and autocorrelation methods. The wristband-type PPG heart rate sensor with our motion artifact cancellation framework was compared with ECG instant heart rate measurement in both laboratory and office environments. In a laboratory environment, mean reliability (percentage of time within 10% error relative to ECG instant heart rate) was 86.5% and the one-day pulse-accuracy achievement rate based on time use data of body motions in daily life was 88.1% or approximately 21 hours. Our dev ice and motion artifact cancellation framework enable continuous heart rate variability monitoring in daily life and could be applied to heart rate variability analysis and emotion recognition.

詳細を見る
技術カテゴリ Medical & Life Science
氏名 T. Ishikawa,
Y. Hyodo,
K. Miyashita,
K. Yoshifuji,
Y. Imai(Sony Corporation)

A wristband-type PPG heart rate sensor capable of overcoming motion artifacts in daily activity and detecting heart rate variability has been developed together with a motion artifact cancellation framework. In this work, a motion artifact model in daily life was derived and motion artifacts caused by activity of arm, finger, and wrist were cancelled significantly. Highly reliable instant heart rate detection with high noise-resistance was achieved from noise-reduced pulse signals based on peak-detection and autocorrelation methods. The wristband-type PPG heart rate sensor with our motion artifact cancellation framework was compared with ECG instant heart rate measurement in both laboratory and office environments. In a laboratory environment, mean reliability (percentage of time within 10% error relative to ECG instant heart rate) was 86.5% and the one-day pulse-accuracy achievement rate based on time use data of body motions in daily life was 88.1% or approximately 21 hours. Our dev ice and motion artifact cancellation framework enable continuous heart rate variability monitoring in daily life and could be applied to heart rate variability analysis and emotion recognition.

[学会名]The Society for Information Display(SID)
[論文タイトル]A Plastic Holographic Waveguide Combiner for Light-weight and Highly-transparent Augmented Reality Glasses

There is a high demand for light‐weight, stylishly designed augmented reality (AR) glasses with natural see‐through capabilities for the wide‐spread distribution of novel wearable device to general consumers. We have successfully developed a unique production process of a holographic waveguide combiner that enables us to laminate holographic optical elements (HOEs) onto a plastic substrate with optical grade quality. The plastic substrate waveguide combiner has a number of advantages over conventional glass substrate combiners; the plastic substrate makes AR glasses lighter in weight and unbreakable. With the lamination process of HOEs, we can apply them to a various designs to satisfy general customers’ wide range of preferences for the style. We also potentially made it possible for the holographic waveguide combiner to be produced in larger volumes at lower costs by using our novel roll‐to‐roll hologram recording and laminating process. In this paper, we present our approach of the plastic substrate HOE production process for AR glasses.

詳細を見る
技術カテゴリ Display & Visual
氏名 T. Yoshida,
K. Tokuyama,
Y. Takai,
D. Tsukuda,
T. Kaneko,
N. Suzuki(Sony Corporation),
T. Anzai(Sony Global Manufacturing & Operations Corporation),
A. Yoshikaie,
K. Akutsu,
A. Machida(Sony Corporation)

There is a high demand for light‐weight, stylishly designed augmented reality (AR) glasses with natural see‐through capabilities for the wide‐spread distribution of novel wearable device to general consumers. We have successfully developed a unique production process of a holographic waveguide combiner that enables us to laminate holographic optical elements (HOEs) onto a plastic substrate with optical grade quality. The plastic substrate waveguide combiner has a number of advantages over conventional glass substrate combiners; the plastic substrate makes AR glasses lighter in weight and unbreakable. With the lamination process of HOEs, we can apply them to a various designs to satisfy general customers’ wide range of preferences for the style. We also potentially made it possible for the holographic waveguide combiner to be produced in larger volumes at lower costs by using our novel roll‐to‐roll hologram recording and laminating process. In this paper, we present our approach of the plastic substrate HOE production process for AR glasses.

[学会名]The Society for Information Display(SID)
[論文タイトル]A Plastic Electrochromic Dimming Device for Augmented Reality Glasses

We have developed an electrochromic dimming device on a plastic substrate with high transparency modulation from 70% to 10% and bending radius below 30 mm. It works more than endurance 10,000 cycles and high‐temperature‐humidity conditions. Combination of the device and AR glass enables the clear image visibility in various environments.

詳細を見る
技術カテゴリ Display & Visual
氏名 A. Machida,
K. Kadono,
Y. Ishii,
T. Kono,
H. Takanashi,
A. Nishiike(Sony Corporation),
H. Suzuki,
Y. Nakagawa,
K. Ando,
D. Kasahara,
A. Takeda(Sony Global Manufacturing & Operations Corporation),
K. Nomoto(Sony Corporation)

We have developed an electrochromic dimming device on a plastic substrate with high transparency modulation from 70% to 10% and bending radius below 30 mm. It works more than endurance 10,000 cycles and high‐temperature‐humidity conditions. Combination of the device and AR glass enables the clear image visibility in various environments.

[学会名]International Solid-State Circuits Conference (ISSCC)
[論文タイトル]Projection and sensing technology of Xperia Touch

詳細を見る
技術カテゴリ Display & Visual
氏名 K. Kaneda(Sony Corporation)

[学会名]Scientific Reports, 8, 10350
[論文タイトル]Lateral optical confinement of GaN-base d VCSEL using an a tomically smooth monolithic curved mirror

We demonstrate the lateral optical confinement of GaN-based vertical-cavity surface-emitting lasers (GaN-VCSELs) with a cavity containing a curved mirror that is formed monolithically on a GaN wafer. The output wavelength of the devices is 441–455 nm. The threshold current is 40 mA (Jth = 141 kA/cm2) under pulsed current injection (Wp = 100 ns; duty = 0.2%) at room temperature. We confirm the lateral optical confinement by recording near-field images and investigating the dependence of threshold current on aperture size. The beam profile can be fitted with a Gaussian having a theoretical standard deviation of σ = 0.723 µm, which is significantly smaller than previously reported values for GaN-VCSELs with plane mirrors. Lateral optical confinement with this structure theoretically allows aperture miniaturization to the diffraction limit, resulting in threshold currents far lower than sub-milliamperes. The proposed structure enabled GaN-based VCSELs to be constructed with cavities as long as 28.3 µm, which greatly simplifies the fabrication process owing to longitudinal mode spacings of less than a few nanometers and should help the implementation of these devices in practice.

詳細を見る
技術カテゴリ Display & Visual
氏名 T. Hamaguchi,
M. Tanaka,
J. Mitomo,
H. Nakajima,
M. Ito,
N. Kobayashi,
K. Fujii,
H. Watanabe,
S. Satou,
M. Ohara,
R. Koda,
H. Narui(Sony Corporation)

We demonstrate the lateral optical confinement of GaN-based vertical-cavity surface-emitting lasers (GaN-VCSELs) with a cavity containing a curved mirror that is formed monolithically on a GaN wafer. The output wavelength of the devices is 441–455 nm. The threshold current is 40 mA (Jth = 141 kA/cm2) under pulsed current injection (Wp = 100 ns; duty = 0.2%) at room temperature. We confirm the lateral optical confinement by recording near-field images and investigating the dependence of threshold current on aperture size. The beam profile can be fitted with a Gaussian having a theoretical standard deviation of σ = 0.723 µm, which is significantly smaller than previously reported values for GaN-VCSELs with plane mirrors. Lateral optical confinement with this structure theoretically allows aperture miniaturization to the diffraction limit, resulting in threshold currents far lower than sub-milliamperes. The proposed structure enabled GaN-based VCSELs to be constructed with cavities as long as 28.3 µm, which greatly simplifies the fabrication process owing to longitudinal mode spacings of less than a few nanometers and should help the implementation of these devices in practice.

[学会名]IEEE Engineering in Medicine and Biology C onference(EMBC)
[論文タイトル]Feature quantities of EEG to characterize human internal states of concentration and relaxation

詳細を見る
技術カテゴリ Medical & Life Science
氏名 N. Sazuka,
Y. Komoriya,
T. Ezaki(Sony Corporation),
M. Uraguchi,
H. Ohira(Nagoya University)

[学会名]The Electrochemical Society(AiMES)
[論文タイトル]Atomic Diffusion Bonding for Optical Devices with High Optical Density

An inorganic bonding method providing 100% light transmittance at the bonded interface was proposed for fabricating deviceswith high optical density. First, we fabricated 5000 nm-thick SiO2 oxide underlayers on synthetic quartz glass wafers. After the film surfaces were polished to reduce surface roughness, thewafers with oxide underlayers were bonded using thin Ti films in vacuum at room temperature as a usual atomic diffusion process.After post annealing at 300 °C, 100% light transmittance at the bonded interface with the surface free energy at the bondedinterface greater than 2 J/m2 was achieved. Dissociated oxygen from oxide layers probably enhanced Ti films oxidation, resulting in high light transmittancewith high bonding strength attributable to the annealing. Using this bonding process, we fabricated a polarizing beam splitterand demonstrated that this bonding process is useful to fabricate devices with high optical density.

詳細を見る
技術カテゴリ Device & Material
氏名 G. Yonezawa,
Y. Takahashi,
Y. Sato,
S. Abe,
M. Uomoto(Sony Corporation),
T. Shimatsu(Tohoku University)

An inorganic bonding method providing 100% light transmittance at the bonded interface was proposed for fabricating deviceswith high optical density. First, we fabricated 5000 nm-thick SiO2 oxide underlayers on synthetic quartz glass wafers. After the film surfaces were polished to reduce surface roughness, thewafers with oxide underlayers were bonded using thin Ti films in vacuum at room temperature as a usual atomic diffusion process.After post annealing at 300 °C, 100% light transmittance at the bonded interface with the surface free energy at the bondedinterface greater than 2 J/m2 was achieved. Dissociated oxygen from oxide layers probably enhanced Ti films oxidation, resulting in high light transmittancewith high bonding strength attributable to the annealing. Using this bonding process, we fabricated a polarizing beam splitterand demonstrated that this bonding process is useful to fabricate devices with high optical density.

[学会名]Applied Physics Letters, 113, 163302
[論文タイトル]Impact of molecular orientation on energy level alignment at C60/pentacene interfaces

The molecular orientation and the electronic structure at molecular donor/acceptor interfaces play an important role in the performance of organic optoelectronic devices. Here, we show that graphene substrates can be used as templates for tuning the molecular orientation of pentacene (PEN), selectively driving the formation of either face-on or edge-on arrangements by controlling the temperature of the substrate during deposition. The electronic structure and morphology of the two resulting C60/PEN heterointerfaces were elucidated using ultraviolet photoelectron spectroscopy and atomic force microscopy, respectively. While the C60/PEN (edge-on) interface exhibited a vacuum level alignment, the C60/PEN (face-on) interface exhibited a vacuum level shift of 0.2 eV, which was attributed to the formation of an interface dipole that resulted from polarization at the C60/PEN boundary.

詳細を見る
技術カテゴリ Material Analysis & Simulation
氏名 T. Nishi,
M. Kanno,
M. Kuribayashi,
Y. Nishida,
S. Hattori,
H. Kobayashi(Sony Corporation),
F. von Wrochem,
V. Rodin,
G. Nelles(Sony Europe Limited, Materials Science Laboratory),
S. Tomiya(Sony Corporation)

The molecular orientation and the electronic structure at molecular donor/acceptor interfaces play an important role in the performance of organic optoelectronic devices. Here, we show that graphene substrates can be used as templates for tuning the molecular orientation of pentacene (PEN), selectively driving the formation of either face-on or edge-on arrangements by controlling the temperature of the substrate during deposition. The electronic structure and morphology of the two resulting C60/PEN heterointerfaces were elucidated using ultraviolet photoelectron spectroscopy and atomic force microscopy, respectively. While the C60/PEN (edge-on) interface exhibited a vacuum level alignment, the C60/PEN (face-on) interface exhibited a vacuum level shift of 0.2 eV, which was attributed to the formation of an interface dipole that resulted from polarization at the C60/PEN boundary.

[学会名]IEEE Vehicular Technology Conference(VTC)
[論文タイトル]GFDM with Different Subcarrier Bandwidths

This paper proposes a generalized frequency division multiplexing (GFDM) modulation scheme that transmits a signal with different subcarrier bandwidths. In a receiver, the GFDM signal is demodulated by using a zero forcing (ZF) algorithm or a minimum mean square error (MMSE) algorithm and the BER performance of these algorithms is related to the condition number of a modulation matrix. This matrix can be optimized by adjusting the roll-off factor of subcarrier filters. It is shown that the performance of the proposed GFDM is about 0.02 dB better than that with a roll-off factor of 0 at a BER of 10-3 on an AWGN channel. On the other hand, on the multipath fading channels, the BER performance improves as the subcarrier bandwidth increases because of frequency diversity.

詳細を見る
技術カテゴリ Communication
氏名 Y. Akai,
Y. Enjoji,
Y. Sanada(Keio Univercity),
R. Kimura,
R. Sawai(Sony Corporation)

This paper proposes a generalized frequency division multiplexing (GFDM) modulation scheme that transmits a signal with different subcarrier bandwidths. In a receiver, the GFDM signal is demodulated by using a zero forcing (ZF) algorithm or a minimum mean square error (MMSE) algorithm and the BER performance of these algorithms is related to the condition number of a modulation matrix. This matrix can be optimized by adjusting the roll-off factor of subcarrier filters. It is shown that the performance of the proposed GFDM is about 0.02 dB better than that with a roll-off factor of 0 at a BER of 10-3 on an AWGN channel. On the other hand, on the multipath fading channels, the BER performance improves as the subcarrier bandwidth increases because of frequency diversity.

[学会名]IEEE Vehicular Technology Conference(VTC)
[論文タイトル]A Singularity-free GFDM Modulation scheme with Parametric Shaping Filter Sampling

A GFDM modulation scheme that circumvents the singularity issue of the GFDM transformation matrix is presented. The coefficients used for the pulse shaping filter are derived from the prototype filter depending on the parity of the subsymbols. The proposed pulse shaping filter design makes it possible to have a non-singular transformation matrix for the arbitrary number of subsymbols and/or subcarriers in the sparse frequency-domain GFDM modulation.

詳細を見る
技術カテゴリ Communication
氏名 A. Yoshizawa,
R. Kimura,
R. Sawai(Sony Corporation)

A GFDM modulation scheme that circumvents the singularity issue of the GFDM transformation matrix is presented. The coefficients used for the pulse shaping filter are derived from the prototype filter depending on the parity of the subsymbols. The proposed pulse shaping filter design makes it possible to have a non-singular transformation matrix for the arbitrary number of subsymbols and/or subcarriers in the sparse frequency-domain GFDM modulation.

[学会名]European Conference on Computer Vision(ECCV)
[論文タイトル]Scene depth profiling using Helmholtz Stereopsis

Helmholtz stereopsis is a 3D reconstruction technique, capturing surface depth independent of the reflection properties of the material by using Helmholtz reciprocity. In this paper we are interested in studying the applicability of Helmholtz stereopsis for surface and depth profiling of objects and general scenes in the context of perspective stereo imaging. Helmholtz stereopsis captures a pair of reciprocal images by exchanging the position of light source and camera. The resulting image pair relates the image intensities and scene depth profile by a partial differential equation. The solution of this differential equation depends on the boundary conditions provided by the scene. We propose to limit the illumination angle of the light source, such that only mutually visible parts are imaged, resulting in stable boundary conditions. By simulation and experiment we show that a unique depth profile can be recovered for a large class of scenes including multiple occluding objects.

詳細を見る
技術カテゴリ Computer Vision & CG
氏名 H. Mori,
R. Koehle,
M. Kamm(Sony Europe Limited)

Helmholtz stereopsis is a 3D reconstruction technique, capturing surface depth independent of the reflection properties of the material by using Helmholtz reciprocity. In this paper we are interested in studying the applicability of Helmholtz stereopsis for surface and depth profiling of objects and general scenes in the context of perspective stereo imaging. Helmholtz stereopsis captures a pair of reciprocal images by exchanging the position of light source and camera. The resulting image pair relates the image intensities and scene depth profile by a partial differential equation. The solution of this differential equation depends on the boundary conditions provided by the scene. We propose to limit the illumination angle of the light source, such that only mutually visible parts are imaged, resulting in stable boundary conditions. By simulation and experiment we show that a unique depth profile can be recovered for a large class of scenes including multiple occluding objects.

[学会名]International Speech Communication Association(Interspeech)
[論文タイトル]Automatic Pronunciation Generation by Utilizing a Semi-Supervised Deep Neural Network

Phonemic or phonetic sub-word units are the most commonly used atomic elements to represent speech signals in modern ASRs. However they are not the optimal choice due to several reasons such as: large amount of effort required to handcraft a pronunciation dictionary, pronunciation variations, human mistakes and under-resourced dialects and languages. Here, we propose a data-driven pronunciation estimation and acoustic modeling method which only takes the orthographic transcription to jointly estimate a set of sub-word units and a reliable dictionary. Experimental results show that the proposed method which is based on semi-supervised training of a deep neural network largely outperforms phoneme based continuous speech recognition on the TIMIT dataset.

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 N. Takahashi(Sony Corporation),
T. Naghibi,
B. Pfister,
L. V. Gool(ETH Zurich)

Phonemic or phonetic sub-word units are the most commonly used atomic elements to represent speech signals in modern ASRs. However they are not the optimal choice due to several reasons such as: large amount of effort required to handcraft a pronunciation dictionary, pronunciation variations, human mistakes and under-resourced dialects and languages. Here, we propose a data-driven pronunciation estimation and acoustic modeling method which only takes the orthographic transcription to jointly estimate a set of sub-word units and a reliable dictionary. Experimental results show that the proposed method which is based on semi-supervised training of a deep neural network largely outperforms phoneme based continuous speech recognition on the TIMIT dataset.

[学会名]International Speech Communication Association(Interspeech)
[論文タイトル]Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection

We propose a novel method for Acoustic Event Detection (AED). In contrast to speech, sounds coming from acoustic events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time period due to the lack of a clear sub-word unit. In order to incorporate the long-time frequency structure for AED, we introduce a convolutional neural network (CNN) with a large input field. In contrast to previous works, this enables to train audio event detection end-to-end. Our architecture is inspired by the success of VGGNet and uses small, 3×3 convolutions, but more depth than previous methods in AED. In order to prevent over-fitting and to take full advantage of the modeling capabilities of our network, we further propose a novel data augmentation method to introduce data variation. Experimental results show that our CNN significantly outperforms state of the art methods including Bag of Audio Words (BoAW) and classical CNNs, achieving a 16% absolute improvement.

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 N. Takahashi(Sony Corporation),
M. Gygli,
B. Pfister,
L. V. Gool(ETH Zurich)

We propose a novel method for Acoustic Event Detection (AED). In contrast to speech, sounds coming from acoustic events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time period due to the lack of a clear sub-word unit. In order to incorporate the long-time frequency structure for AED, we introduce a convolutional neural network (CNN) with a large input field. In contrast to previous works, this enables to train audio event detection end-to-end. Our architecture is inspired by the success of VGGNet and uses small, 3×3 convolutions, but more depth than previous methods in AED. In order to prevent over-fitting and to take full advantage of the modeling capabilities of our network, we further propose a novel data augmentation method to introduce data variation. Experimental results show that our CNN significantly outperforms state of the art methods including Bag of Audio Words (BoAW) and classical CNNs, achieving a 16% absolute improvement.

[学会名]IEEE Advanced Information Networking and Applications (AINA)
[論文タイトル]Dynamic Sensitivity Control based on Two-Hop Farthest Terminal in Dense WLAN

The explosive usage of IEEE 802.11 Wireless Local Area Network (WLAN) has resulted in its dense deployments and excessive interference between Basic Service Sets (BSSs) in urban area such as an apartment building and an airport. Serious problems of hidden/exposed terminal in high-density condition negatively impact system throughput. To improve the system efficiency, IEEE 802.11ax TG has been assembled. TG aims at realizing High-Efficiency- WLAN (HEW) by utilizing special reuse technologies including Dynamic Sensitivity Control (DSC), Transmit Power Control (TPC), and BSS Color Filtering (BCF). In this paper, we propose a DSC based on two-hop farthest terminal for dense WLAN. This scheme with minimum transmission power resolves the hidden terminal problem. Propagation loss of received signal from associated communication pair is used for the proper values of transmission power and carrier sense level. Furthermore, adjusting these parameters destination by destination can reduce exposed terminals effectively. We evaluate the performance of the proposed scheme in residential building scenario with three criteria, aggregate throughput, fairness and frame error rate. Simulation results show that the proposed scheme can improve aggregate downlink throughput and fairness compared to previously proposed method that carrier sense level is set based on expected RSSI level of received packet from communicating pair. Furthermore, improvement of frame loss rate implies that the hidden terminal problem can be solved by the proposed scheme.

詳細を見る
技術カテゴリ Communication
氏名 T. Ohnuma,
H. Shigeno (Keio Univercity),
T. Yamaura,
Y. Tanaka(Sony Corporation)

The explosive usage of IEEE 802.11 Wireless Local Area Network (WLAN) has resulted in its dense deployments and excessive interference between Basic Service Sets (BSSs) in urban area such as an apartment building and an airport. Serious problems of hidden/exposed terminal in high-density condition negatively impact system throughput. To improve the system efficiency, IEEE 802.11ax TG has been assembled. TG aims at realizing High-Efficiency- WLAN (HEW) by utilizing special reuse technologies including Dynamic Sensitivity Control (DSC), Transmit Power Control (TPC), and BSS Color Filtering (BCF). In this paper, we propose a DSC based on two-hop farthest terminal for dense WLAN. This scheme with minimum transmission power resolves the hidden terminal problem. Propagation loss of received signal from associated communication pair is used for the proper values of transmission power and carrier sense level. Furthermore, adjusting these parameters destination by destination can reduce exposed terminals effectively. We evaluate the performance of the proposed scheme in residential building scenario with three criteria, aggregate throughput, fairness and frame error rate. Simulation results show that the proposed scheme can improve aggregate downlink throughput and fairness compared to previously proposed method that carrier sense level is set based on expected RSSI level of received packet from communicating pair. Furthermore, improvement of frame loss rate implies that the hidden terminal problem can be solved by the proposed scheme.

[学会名]IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP)
[論文タイトル]Multichannel blind source separation based on non-negative tensor factorization in wavenumber domain

Multichannel non-negative matrix factorization based on a spatial covariance model is one of the most promising techniques for blind source separation. However, this approach is not tractable for a large number of microphones, M, because the computational cost is of order O(M3) per time-frequency bin. To circumvent this drawback, we propose non-negative tensor factorization in the wavenumber domain, which reduces the cost to the order O(M). It transforms microphone signals into the spatial frequency domain, a technique that is commonly used for soundfield reconstruction. The proposed method is compared to several blind source separation (BSS) methods in terms of separation quality and computational cost.

詳細を見る
技術カテゴリ Audio & Acoustics
氏名 Y. Mitsufuji(Sony Corporation),
S. Koyama, H. Saruwatari(The University of Tokyo)

Multichannel non-negative matrix factorization based on a spatial covariance model is one of the most promising techniques for blind source separation. However, this approach is not tractable for a large number of microphones, M, because the computational cost is of order O(M3) per time-frequency bin. To circumvent this drawback, we propose non-negative tensor factorization in the wavenumber domain, which reduces the cost to the order O(M). It transforms microphone signals into the spatial frequency domain, a technique that is commonly used for soundfield reconstruction. The proposed method is compared to several blind source separation (BSS) methods in terms of separation quality and computational cost.

[学会名]International Conference on Pattern Recognition(ICPR)
[論文タイトル]Latent Model Ensemble with Auto-Localization

Deep Convolutional Neural Networks (CNN) have exhibited superior performance in many visual recognition tasks including image classification, object detection, and scene label- ing, due to their large learning capacity and resistance to overfit. For the image classification task, most of the current deep CNN- based approaches take the whole size-normalized image as input and have achieved quite promising results. Compared with the previously dominating approaches based on feature extraction, pooling, and classification, the deep CNN-based approaches mainly rely on the learning capability of deep CNN to achieve superior results: the burden of minimizing intra-class variation while maximizing inter-class difference is entirely dependent on the implicit feature learning component of deep CNN; we rely upon the implicitly learned filters and pooling component to select the discriminative regions, which correspond to the activated neurons. However, if the irrelevant regions constitute a large portion of the image of interest, the classification performance of the deep CNN, which takes the whole image as input, can be heavily affected. To solve this issue, we propose a novel latent CNN framework, which treats the most discriminate region as a latent variable. We can jointly learn the global CNN with the latent CNN to avoid the aforementioned big irrelevant region issue, and our experimental results show the evident advantage of the proposed latent CNN over traditional deep CNN: latent CNN outperforms the state-of-the-art performance of deep CNN on standard benchmark datasets including the CIFAR-10, CIFAR- 100, MNIST and PASCAL VOC 2007 Classification dataset.

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 M. Sun, T. X. Han(University of Missouri),
X. Xu, M-C Liu, A. K-Rostamabad(Sony Electronics Inc)

Deep Convolutional Neural Networks (CNN) have exhibited superior performance in many visual recognition tasks including image classification, object detection, and scene label- ing, due to their large learning capacity and resistance to overfit. For the image classification task, most of the current deep CNN- based approaches take the whole size-normalized image as input and have achieved quite promising results. Compared with the previously dominating approaches based on feature extraction, pooling, and classification, the deep CNN-based approaches mainly rely on the learning capability of deep CNN to achieve superior results: the burden of minimizing intra-class variation while maximizing inter-class difference is entirely dependent on the implicit feature learning component of deep CNN; we rely upon the implicitly learned filters and pooling component to select the discriminative regions, which correspond to the activated neurons. However, if the irrelevant regions constitute a large portion of the image of interest, the classification performance of the deep CNN, which takes the whole image as input, can be heavily affected. To solve this issue, we propose a novel latent CNN framework, which treats the most discriminate region as a latent variable. We can jointly learn the global CNN with the latent CNN to avoid the aforementioned big irrelevant region issue, and our experimental results show the evident advantage of the proposed latent CNN over traditional deep CNN: latent CNN outperforms the state-of-the-art performance of deep CNN on standard benchmark datasets including the CIFAR-10, CIFAR- 100, MNIST and PASCAL VOC 2007 Classification dataset.

[学会名]IEEE Dynamic Spectrum Access Networks(DySPAN)
[論文タイトル]Aggregate Interference Prediction Based on Back-Propagation Neural Network

In dynamic spectrum access (DSA) scenarios, dense and complex deployment (e.g., in nonuniform or unknown radio propagation environment) of secondary systems (SSs) will make aggregate interference estimation highly complicated or challenging for reliable primary system (PS) protection. To tackle this problem, a back-propagation (BP) neural network based aggregate interference prediction method is proposed and evaluated via simulations. This paper also gives design guidelines of BP neural network appropriate for aggregate interference prediction via revealing the impact of several key factors on the prediction accuracy, such as the number of input parameters to the neural network, the coordinate system in use, and the number of hidden neurons.

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 Y. Zhao,
L. Shi (Beijing Jiaotong University),
X. Guo,
C. Sun (SCRL)

In dynamic spectrum access (DSA) scenarios, dense and complex deployment (e.g., in nonuniform or unknown radio propagation environment) of secondary systems (SSs) will make aggregate interference estimation highly complicated or challenging for reliable primary system (PS) protection. To tackle this problem, a back-propagation (BP) neural network based aggregate interference prediction method is proposed and evaluated via simulations. This paper also gives design guidelines of BP neural network appropriate for aggregate interference prediction via revealing the impact of several key factors on the prediction accuracy, such as the number of input parameters to the neural network, the coordinate system in use, and the number of hidden neurons.

[学会名]第26回 マルチメディア通信と分散処理ワークショップ (DPSWS)
[論文タイトル]密集無線LAN環境における公平性向上を目的としたアクセスポイント間協調による信号検出閾値制御の検討

近年,都市部では無線技術の発展に伴い多くの無線 LAN (Local Area Network) が乱立している.多数の無線端末が近接して配置されている密集環境では,同一チャネル干渉によってシステムの性能が大きく悪化する.IEEE 802.11ax Task Group では,この問題に対処する最も有効な手段の一つとして信号検出閾値,送信電力の最適制御を検討している.これらの制御を行う多くの既存研究では,システム性能を向上させる一方,他 Basic Service Set (BSS) のステーション (STA) を考慮していないため,スループット性能の BSS 間での公平性は十分ではない.本稿の目的は信号検出閾値の制御を行うことで,送信機会の枯渇を抑制し,BSS 間のスループット公平性を向上させることである.提案手法の fair Dynamic Sensitivity Control (fairDSC) では,アクセスポイント (AP) 間でスループットやデータフレームの送信数を交換し,スループットと送信機会の観点から相対的に性能の悪い BSS の性能が向上するよう協調的に信号検出閾値の制御を行った.シミュレーション評価では,既存研究と比較して送信機会や DL スループットの公平性が改善することを示した.

詳細を見る
技術カテゴリ Network & Data Analytics
氏名 岩井 皓暉,
大沼 貴信,
重野 寛教授 (慶應義塾大学),
田中 悠介(ソニー株式会社)

近年,都市部では無線技術の発展に伴い多くの無線 LAN (Local Area Network) が乱立している.多数の無線端末が近接して配置されている密集環境では,同一チャネル干渉によってシステムの性能が大きく悪化する.IEEE 802.11ax Task Group では,この問題に対処する最も有効な手段の一つとして信号検出閾値,送信電力の最適制御を検討している.これらの制御を行う多くの既存研究では,システム性能を向上させる一方,他 Basic Service Set (BSS) のステーション (STA) を考慮していないため,スループット性能の BSS 間での公平性は十分ではない.本稿の目的は信号検出閾値の制御を行うことで,送信機会の枯渇を抑制し,BSS 間のスループット公平性を向上させることである.提案手法の fair Dynamic Sensitivity Control (fairDSC) では,アクセスポイント (AP) 間でスループットやデータフレームの送信数を交換し,スループットと送信機会の観点から相対的に性能の悪い BSS の性能が向上するよう協調的に信号検出閾値の制御を行った.シミュレーション評価では,既存研究と比較して送信機会や DL スループットの公平性が改善することを示した.

[学会名]Empirical Methods on Natural Language Processing(EMNLP)
[論文タイトル]Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

Modeling textual or visual information with vector representations trained from large language or visual datasets has been successfully explored in recent years. However, tasks such as visual question answering require combining these vector representations with each other. Approaches to multimodal pooling include element-wise product or sum, as well as concatenation of the visual and textual representations. We hypothesize that these methods are not as expressive as an outer product of the visual and textual vectors. As the outer product is typically infeasible due to its high dimensionality, we instead propose utilizing Multimodal Compact Bilinear pooling (MCB) to efficiently and expressively combine multimodal features. We extensively evaluate MCB on the visual question answering and grounding tasks. We consistently show the benefit of MCB over ablations without MCB. For visual question answering, we present an architecture which uses MCB twice, once for predicting attention over spatial features and again to combine the attended representation with the question representation. This model outperforms the state-of-the-art on the Visual7W dataset and the VQA challenge.

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 A. Fukui(Sony Corporation/University of California, Berkeley),
D. H. Park,
D. Yang(University of California, Berkeley),
A. Rohrbach(University of California, Berkeley/Max Planck Instisute of Technology),
T. Darrell,
M. Rohrbach(University of California, Berkeley)

Modeling textual or visual information with vector representations trained from large language or visual datasets has been successfully explored in recent years. However, tasks such as visual question answering require combining these vector representations with each other. Approaches to multimodal pooling include element-wise product or sum, as well as concatenation of the visual and textual representations. We hypothesize that these methods are not as expressive as an outer product of the visual and textual vectors. As the outer product is typically infeasible due to its high dimensionality, we instead propose utilizing Multimodal Compact Bilinear pooling (MCB) to efficiently and expressively combine multimodal features. We extensively evaluate MCB on the visual question answering and grounding tasks. We consistently show the benefit of MCB over ablations without MCB. For visual question answering, we present an architecture which uses MCB twice, once for predicting attention over spatial features and again to combine the attended representation with the question representation. This model outperforms the state-of-the-art on the Visual7W dataset and the VQA challenge.

[学会名]Association for Computational Linguistics(ACL)
[論文タイトル]Domain Adaptation for Neural Networks by Parameter Augmentation

We propose a simple domain adaptation method for neural networks in a supervised setting. Supervised domain adaptation is a way of improving the generalization performance on the target domain by using the source domain dataset, assuming that both of the datasets are labeled. Recently, recurrent neural networks have been shown to be successful on a variety of NLP tasks such as caption generation; however, the existing domain adaptation techniques are limited to (1) tune the model parameters by the target dataset after the training by the source dataset, or (2) design the network to have dual output, one for the source domain and the other for the target domain. Reformulating the idea of the domain adaptation technique proposed by Daume (2007), we propose a simple domain adaptation method, which can be applied to neural networks trained with a cross-entropy loss. On captioning datasets, we show performance improvements over other domain adaptation methods.

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 Y. Watanabe(Sony Corporation),
K. Hashimoto, Y. Tsuruoka(The Univercity of Tokyo)

We propose a simple domain adaptation method for neural networks in a supervised setting. Supervised domain adaptation is a way of improving the generalization performance on the target domain by using the source domain dataset, assuming that both of the datasets are labeled. Recently, recurrent neural networks have been shown to be successful on a variety of NLP tasks such as caption generation; however, the existing domain adaptation techniques are limited to (1) tune the model parameters by the target dataset after the training by the source dataset, or (2) design the network to have dual output, one for the source domain and the other for the target domain. Reformulating the idea of the domain adaptation technique proposed by Daume (2007), we propose a simple domain adaptation method, which can be applied to neural networks trained with a cross-entropy loss. On captioning datasets, we show performance improvements over other domain adaptation methods.

[学会名]International Conference of Intelligent Robotic and Control Engineering (IRCE)
[論文タイトル]One-shot Learning Gesture Recognition Based on Evolution of Discrimination with Successive Memory

In this paper, a one-shot learning gesture recognition algorithm based on evolution of discrimination with successive memory is presented, which utilizes the transferability of large-scale pre-trained DNN (Deep Neural Network) gesture recognition model and distance discrimination to carry out high-performance recognition with evolutionary discrimination. Our scheme can be narrated as follows. Firstly, a DNN gesture recognition model is proactively trained by a sample set with 19 classes of BSG dataset as a transferable model with its powerful extractor of features. Secondly, the transferable capacity of extractor is employed to extract features of labeled root samples and test samples respectively towards one-shot learning gesture recognition so as to achieve a high performance feature extraction and structured arraying. Finally, the discriminative recognition can be carried out with Euclidean distance measure between the root features and test features. Meanwhile a mechanism of updating and evolution of root features memory is built and utilized for one-shot learning gesture recognition so as to enhance the performance of recognition. A kind of software for online one-shot learning gesture recognition towards practical applications is designed and developed to achieve outstanding performance with fast response speed and high recognition accuracy. A series of experiments on the additional 10 classes of BSG dataset are conducted to verify and validate the performance advantages of our proposed one-shot learning gesture recognition algorithm.

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 X. Li, S. Qin (BUAA School of Automation Science and Electrical Engineering)
Kuanhong Xu, Zhongying Hu (SCRL)

In this paper, a one-shot learning gesture recognition algorithm based on evolution of discrimination with successive memory is presented, which utilizes the transferability of large-scale pre-trained DNN (Deep Neural Network) gesture recognition model and distance discrimination to carry out high-performance recognition with evolutionary discrimination. Our scheme can be narrated as follows. Firstly, a DNN gesture recognition model is proactively trained by a sample set with 19 classes of BSG dataset as a transferable model with its powerful extractor of features. Secondly, the transferable capacity of extractor is employed to extract features of labeled root samples and test samples respectively towards one-shot learning gesture recognition so as to achieve a high performance feature extraction and structured arraying. Finally, the discriminative recognition can be carried out with Euclidean distance measure between the root features and test features. Meanwhile a mechanism of updating and evolution of root features memory is built and utilized for one-shot learning gesture recognition so as to enhance the performance of recognition. A kind of software for online one-shot learning gesture recognition towards practical applications is designed and developed to achieve outstanding performance with fast response speed and high recognition accuracy. A series of experiments on the additional 10 classes of BSG dataset are conducted to verify and validate the performance advantages of our proposed one-shot learning gesture recognition algorithm.

[学会名]IEEE Robotics and Automation Letters (RA-L) with ICRA2018 option
[論文タイトル]Comparison between Force-controlled Skin Deformation Feedback and Hand-Grounded Kinesthetic    
  Force Feedback for Sensory Substitution

Teleoperation and virtual reality systems benefit from force sensory substitution when kinesthetic force feedback devices are infeasible due to stability or workspace limitations. We compared the performance of sensory substitution when it is provided through a cutaneous method (skin deformation feedback) and a kinesthetic method (hand-grounded force feedback). For skin deformation feedback, we used a new force-controlled tactile sensory substitution device with the ability to provide tangential and normal force directly to the finger pad. Three-axis force control with 15 Hz bandwidth was achieved using a delta mechanism and three-axis force sensor. For hand-grounded force feedback, forces were grounded against the palm. As a control, world-grounded force feedback was provided using a three-degree-of-freedom kinesthetic force feedback device. Study participants were able to match a reference world-grounded force better with hand-grounded kinesthetic force feedback than with skin deformation feedback. Participants were also able to apply more accurate and precise forces with hand-grounded kinesthetic force feedback than with skin deformation feedback. Conversely, skin deformation feedback resulted in the lowest error during initial force adjustment. These experiments demonstrate relative advantages and disadvantages of skin deformation and hand-grounded kinesthetic force feedback for force sensory substitution.

詳細を見る
技術カテゴリ Robotics
氏名 Y. Kamikawa(Sony Corporation)

Teleoperation and virtual reality systems benefit from force sensory substitution when kinesthetic force feedback devices are infeasible due to stability or workspace limitations. We compared the performance of sensory substitution when it is provided through a cutaneous method (skin deformation feedback) and a kinesthetic method (hand-grounded force feedback). For skin deformation feedback, we used a new force-controlled tactile sensory substitution device with the ability to provide tangential and normal force directly to the finger pad. Three-axis force control with 15 Hz bandwidth was achieved using a delta mechanism and three-axis force sensor. For hand-grounded force feedback, forces were grounded against the palm. As a control, world-grounded force feedback was provided using a three-degree-of-freedom kinesthetic force feedback device. Study participants were able to match a reference world-grounded force better with hand-grounded kinesthetic force feedback than with skin deformation feedback. Participants were also able to apply more accurate and precise forces with hand-grounded kinesthetic force feedback than with skin deformation feedback. Conversely, skin deformation feedback resulted in the lowest error during initial force adjustment. These experiments demonstrate relative advantages and disadvantages of skin deformation and hand-grounded kinesthetic force feedback for force sensory substitution.

[学会名]IEEE International Conference on Robotics and Automation(ICRA)
[論文タイトル]Latency and Refresh Rate on Force Perception via Sensory Substitution by Force-Controlled Skin Deformation Feedback 

Latency and refresh rate are known to adversely affect human force perception in bilateral teleoperators and virtual environments using kinesthetic force feedback, motivating the use of sensory substitution of force. The purpose of this study is to quantify the effects of latency and refresh rate on force perception using sensory substitution by skin deformation feedback. A force-controlled skin deformation feedback device was attached to a 3-degree-of-freedom kinesthetic force feedback device used for position tracking and gravity support. A human participant study was conducted to determine the effects of latency and refresh rate on perceived stiffness and damping with skin deformation feedback. Participants compared two virtual objects: a comparison object with stiffness or damping that could be tuned by the participant, and a reference object with either added latency or reduced refresh rate. Participants modified the stiffness or damping of the tunable object until it resembled the stiffness or damping of the reference object. We found that added latency and reduced refresh rate both increased perceived stiffness but had no effect on perceived damping. Specifically, participants felt significantly different stiffness when the latency exceeded 300 ms and the refresh rate dropped below 16.6 Hz. The impact of latency and refresh rate on force perception via skin deformation feedback was significantly less than what has been previously shown for kinesthetic force feedback.

詳細を見る
技術カテゴリ Robotics
氏名 Z. A. Zook, A. M. Okamura(Stanford University),
Y. Kamikawa(Sony Corporation)

Latency and refresh rate are known to adversely affect human force perception in bilateral teleoperators and virtual environments using kinesthetic force feedback, motivating the use of sensory substitution of force. The purpose of this study is to quantify the effects of latency and refresh rate on force perception using sensory substitution by skin deformation feedback. A force-controlled skin deformation feedback device was attached to a 3-degree-of-freedom kinesthetic force feedback device used for position tracking and gravity support. A human participant study was conducted to determine the effects of latency and refresh rate on perceived stiffness and damping with skin deformation feedback. Participants compared two virtual objects: a comparison object with stiffness or damping that could be tuned by the participant, and a reference object with either added latency or reduced refresh rate. Participants modified the stiffness or damping of the tunable object until it resembled the stiffness or damping of the reference object. We found that added latency and reduced refresh rate both increased perceived stiffness but had no effect on perceived damping. Specifically, participants felt significantly different stiffness when the latency exceeded 300 ms and the refresh rate dropped below 16.6 Hz. The impact of latency and refresh rate on force perception via skin deformation feedback was significantly less than what has been previously shown for kinesthetic force feedback.

[学会名]IEEE International Conference on Robotics and Automation(ICRA)
[論文タイトル]Magnified Force Sensory Substitution for Telemanipulation  via Force-Controlled Skin Deformation

Teleoperation systems could benefit from force sensory substitution when kinesthetic force feedback systems are too bulky or expensive, and when they cause instability by magnifying force feedback. We aim to magnify force feedback using sensory substitution via force-controlled tactile skin deformation, using a device with the ability to provide tangential and normal force directly to the fingerpads. The sensory substitution device is able to provide skin deformation force feedback over ten times the maximum stable kinesthetic force feedback on a da Vinci Research Kit teleoperation system. We evaluated the effect of this force magnification in two experimental tasks where the goal was to minimize interaction force with the environment. In a peg transfer task, magnified force feedback using sensory substitution improved participants’ performance for force magnifications up to ten times, but decreased performance for higher force magnifications. In a tube connection task, sensory substitution that doubled the force feedback maximized performance; there was no improvement at the larger magnifications. These experiments demonstrate that magnified force feedback using sensory substitution via force-controlled skin deformation feedback can decrease applied forces similarly to magnified kinesthetic force feedback during teleoperation.

詳細を見る
技術カテゴリ Robotics
氏名 Y. Kamikawa(Sony Corporation)

Teleoperation systems could benefit from force sensory substitution when kinesthetic force feedback systems are too bulky or expensive, and when they cause instability by magnifying force feedback. We aim to magnify force feedback using sensory substitution via force-controlled tactile skin deformation, using a device with the ability to provide tangential and normal force directly to the fingerpads. The sensory substitution device is able to provide skin deformation force feedback over ten times the maximum stable kinesthetic force feedback on a da Vinci Research Kit teleoperation system. We evaluated the effect of this force magnification in two experimental tasks where the goal was to minimize interaction force with the environment. In a peg transfer task, magnified force feedback using sensory substitution improved participants’ performance for force magnifications up to ten times, but decreased performance for higher force magnifications. In a tube connection task, sensory substitution that doubled the force feedback maximized performance; there was no improvement at the larger magnifications. These experiments demonstrate that magnified force feedback using sensory substitution via force-controlled skin deformation feedback can decrease applied forces similarly to magnified kinesthetic force feedback during teleoperation.

[学会名]International Speech Communication Association(Interspeech)
[論文タイトル]Attention-based Convolutional Neural Networks for Sentence Classification

Sentence classification is one of the foundational tasks in spoken
language understanding (SLU) and natural language processing
(NLP). In this paper we propose a novel convolutional
neural network (CNN) with attention mechanism to improve
the performance of sentence classification. In traditional CNN,
it is not easy to encode long term contextual information
and correlation between non-consecutive words effectively.
In contrast, our attention-based CNN is able to capture these
kinds of information for each word without any external features.
We conducted experiments on various public and inhouse
datasets. The experimental results demonstrate that our
proposed model significantly outperforms the traditional CNN
model and achieves competitive performance with the ones that
exploit rich syntactic features.

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 Z. Zhao,
Y. Wu(Sony(China)Limited)

Sentence classification is one of the foundational tasks in spoken
language understanding (SLU) and natural language processing
(NLP). In this paper we propose a novel convolutional
neural network (CNN) with attention mechanism to improve
the performance of sentence classification. In traditional CNN,
it is not easy to encode long term contextual information
and correlation between non-consecutive words effectively.
In contrast, our attention-based CNN is able to capture these
kinds of information for each word without any external features.
We conducted experiments on various public and inhouse
datasets. The experimental results demonstrate that our
proposed model significantly outperforms the traditional CNN
model and achieves competitive performance with the ones that
exploit rich syntactic features.

[学会名]IEEE Computer Vision and Pattern Recognition(CVPR)
[論文タイトル]Affinity CNN: Learning Pixel-Centric Pairwise Relations for Figure/Ground Embedding

Spectral embedding provides a framework for solving perceptual organization problems, including image segmentation and figure/ground organization. From an affinity matrix describing pairwise relationships between pixels, it clusters pixels into regions, and, using a complex-valued extension, orders pixels according to layer. We train a convolutional neural network (CNN) to directly predict the pairwise relationships that define this affinity matrix. Spectral embedding then resolves these predictions into a globally-consistent segmentation and figure/ground organization of the scene. Experiments demonstrate significant benefit to this direct coupling compared to prior works which use explicit intermediate stages, such as edge detection, on the pathway from image to affinities. Our results suggest spectral embedding as a powerful alternative to the conditional random field (CRF)-based globalization schemes typically coupled to deep neural networks.

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 M. Maire(Toyota Technological Institute at Chicago),
T. Narihira(Sony/University of California, Berkeley),
S. X. Yu(University of California, Berkeley)

Spectral embedding provides a framework for solving perceptual organization problems, including image segmentation and figure/ground organization. From an affinity matrix describing pairwise relationships between pixels, it clusters pixels into regions, and, using a complex-valued extension, orders pixels according to layer. We train a convolutional neural network (CNN) to directly predict the pairwise relationships that define this affinity matrix. Spectral embedding then resolves these predictions into a globally-consistent segmentation and figure/ground organization of the scene. Experiments demonstrate significant benefit to this direct coupling compared to prior works which use explicit intermediate stages, such as edge detection, on the pathway from image to affinities. Our results suggest spectral embedding as a powerful alternative to the conditional random field (CRF)-based globalization schemes typically coupled to deep neural networks.

[学会名]Association for the Advancement of Artificial Intelligence(AAAI)
[論文タイトル]Modeling Human Understanding of Complex Intentional Action with a Bayesian Nonparametric Subgoal Model

Most human behaviors consist of multiple parts, steps, or subtasks.
These structures guide our action planning and execution,
but when we observe others, the latent structure of their
actions is typically unobservable, and must be inferred in order
to learn new skills by demonstration, or to assist others
in completing their tasks. For example, an assistant who has
learned the subgoal structure of a colleague’s task can more
rapidly recognize and support their actions as they unfold.
Here we model how humans infer subgoals from observations
of complex action sequences using a nonparametric Bayesian
model, which assumes that observed actions are generated by
approximately rational planning over unknown subgoal sequences.
We test this model with a behavioral experiment in
which humans observed different series of goal-directed actions,
and inferred both the number and composition of the
subgoal sequences associated with each goal. The Bayesian
model predicts human subgoal inferences with high accuracy,
and significantly better than several alternative models and
straightforward heuristics. Motivated by this result, we simulate
how learning and inference of subgoals can improve performance
in an artificial user assistance task. The Bayesian
model learns the correct subgoals from fewer observations,
and better assists users by more rapidly and accurately inferring
the goal of their actions than alternative approaches.

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 R. Nakahashi(Sony Corporation/Masachusetts Institute of Technology),
C. L. Baker,
J. B. Tenenbaum(Masachusetts Institute of Technology)

Most human behaviors consist of multiple parts, steps, or subtasks.
These structures guide our action planning and execution,
but when we observe others, the latent structure of their
actions is typically unobservable, and must be inferred in order
to learn new skills by demonstration, or to assist others
in completing their tasks. For example, an assistant who has
learned the subgoal structure of a colleague’s task can more
rapidly recognize and support their actions as they unfold.
Here we model how humans infer subgoals from observations
of complex action sequences using a nonparametric Bayesian
model, which assumes that observed actions are generated by
approximately rational planning over unknown subgoal sequences.
We test this model with a behavioral experiment in
which humans observed different series of goal-directed actions,
and inferred both the number and composition of the
subgoal sequences associated with each goal. The Bayesian
model predicts human subgoal inferences with high accuracy,
and significantly better than several alternative models and
straightforward heuristics. Motivated by this result, we simulate
how learning and inference of subgoals can improve performance
in an artificial user assistance task. The Bayesian
model learns the correct subgoals from fewer observations,
and better assists users by more rapidly and accurately inferring
the goal of their actions than alternative approaches.

[学会名]IEEE Signal Processing Advances in Wireless Communications(SPAWC)
[論文タイトル]Low Complexity Beamforming Training Method for mmWave Communications

This paper introduces a low complexity method for antenna sector selection in mmWave Hybrid MIMO communication systems like the IEEE 802.11ay amendment for Wireless LANs. The method is backwards compatible to the methods already defined for the released mmWave standard IEEE 802.11ad. We introduce an extension of the 802.11ad channel model to support common Hybrid MIMO configurations. The proposed method is evaluated and compared to the theoretical limit of transmission rates found by exhaustive search. In contrast to state-of-the-art solutions, the presented method requires sparse channel information only. Numerical results show a significant complexity reduction in terms of number of necessary trainings, while approaching maximum achievable rate.

詳細を見る
技術カテゴリ Communication
氏名 F. Fellhauer(Sony Europe Limited/University of Stuttgart),
N. Loghin, D. Ciochina, T. Handte(Sony Europe Limited),
S. ten Brink(University of Stuttgart)

This paper introduces a low complexity method for antenna sector selection in mmWave Hybrid MIMO communication systems like the IEEE 802.11ay amendment for Wireless LANs. The method is backwards compatible to the methods already defined for the released mmWave standard IEEE 802.11ad. We introduce an extension of the 802.11ad channel model to support common Hybrid MIMO configurations. The proposed method is evaluated and compared to the theoretical limit of transmission rates found by exhaustive search. In contrast to state-of-the-art solutions, the presented method requires sparse channel information only. Numerical results show a significant complexity reduction in terms of number of necessary trainings, while approaching maximum achievable rate.

[学会名]IEEE Broadband Multimedia Systems and Broadcasting(BMSB)
[論文タイトル]Terrestrial broadcast system using preamble and frequency division multiplexing

Broadcast systems based on FDM (Frequency Division Multiplex) have the advantage of near continuous demodulation of the broadcast signal, allowing accurate and continuous tracking of channel conditions which is particularly useful for mobile reception. This has been employed in the ISDB-T standard used in Japan, Brazil and other countries. However, as designed in ISDB-T the broadcast signal lacks the ability to send system parameters such as FFT size, GI size and so on before the receiver begins demodulation. The receiver must blindly estimate such system parameters before it can read the other detailed parameter information using the TMCC pilot carriers. This takes time, usually one frame or longer. This paper proposes a next generation FDM system which enables the original advantages of FDM to be retained, while allowing additional advantages by employing an additional small signal (Preamble 1) which imparts essential information such as FFT size, GI size and pilot pattern to the receiver to enable immediate demodulation of the broadcast signal based on known parameters rather than blind estimation. Following demodulation of the first preamble, demodulation of the second preamble (Preamble 2) allows immediate knowledge of the all subsequent parameters contributing to faster demodulation of the overall signal.

詳細を見る
技術カテゴリ B2B & Professional
氏名 L. Michael,
K. Takahashi,
Y. Shinohara,
L. Sakai,
M. Kan(Sony Corporation),
S. Atungsiri(Sony Europe Limited)

Broadcast systems based on FDM (Frequency Division Multiplex) have the advantage of near continuous demodulation of the broadcast signal, allowing accurate and continuous tracking of channel conditions which is particularly useful for mobile reception. This has been employed in the ISDB-T standard used in Japan, Brazil and other countries. However, as designed in ISDB-T the broadcast signal lacks the ability to send system parameters such as FFT size, GI size and so on before the receiver begins demodulation. The receiver must blindly estimate such system parameters before it can read the other detailed parameter information using the TMCC pilot carriers. This takes time, usually one frame or longer. This paper proposes a next generation FDM system which enables the original advantages of FDM to be retained, while allowing additional advantages by employing an additional small signal (Preamble 1) which imparts essential information such as FFT size, GI size and pilot pattern to the receiver to enable immediate demodulation of the broadcast signal based on known parameters rather than blind estimation. Following demodulation of the first preamble, demodulation of the second preamble (Preamble 2) allows immediate knowledge of the all subsequent parameters contributing to faster demodulation of the overall signal.

[学会名]IEEE Transactions on Multimedia
[論文タイトル]AENet: Learning Deep Audio Features for Video Analysis

We propose a new deep network for audio event recognition, called AENet. In contrast to speech, sounds coming from audio events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time period due to the lack of clear subword units that are present in speech. In order to incorporate this long-time frequency structure of audio events, we introduce a convolutional neural network (CNN) operating on a large temporal input. In contrast to previous works, this allows us to train an audio event detection system end to end. The combination of our network architecture and a novel data augmentation outperforms previous methods for audio event detection by 16%. Furthermore, we perform transfer learning and show that our model learned generic audio features, similar to the way CNNs learn generic features on vision tasks. In video analysis, combining visual features and traditional audio features, such as mel frequency cepstral coefficients, typically only leads to marginal improvements. Instead, combining visual features with our AENet features, which can be computed efficiently on a GPU, leads to significant performance improvements on action recognition and video highlight detection. In video highlight detection, our audio features improve the performance by more than 8% over visual features alone.

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 N. Takahashi(Sony Corporation),
M. Gygli,
L. Van Gool(ETH Zurich)

We propose a new deep network for audio event recognition, called AENet. In contrast to speech, sounds coming from audio events may be produced by a wide variety of sources. Furthermore, distinguishing them often requires analyzing an extended time period due to the lack of clear subword units that are present in speech. In order to incorporate this long-time frequency structure of audio events, we introduce a convolutional neural network (CNN) operating on a large temporal input. In contrast to previous works, this allows us to train an audio event detection system end to end. The combination of our network architecture and a novel data augmentation outperforms previous methods for audio event detection by 16%. Furthermore, we perform transfer learning and show that our model learned generic audio features, similar to the way CNNs learn generic features on vision tasks. In video analysis, combining visual features and traditional audio features, such as mel frequency cepstral coefficients, typically only leads to marginal improvements. Instead, combining visual features with our AENet features, which can be computed efficiently on a GPU, leads to significant performance improvements on action recognition and video highlight detection. In video highlight detection, our audio features improve the performance by more than 8% over visual features alone.

[学会名]IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP)
[論文タイトル]Improving music source separation based on deep neural networks through data augmentation and network blending

This paper deals with the separation of music into individual instrument tracks which is known to be a challenging problem. We describe two different deep neural network architectures for this task, a feed-forward and a recurrent one, and show that each of them yields themselves state-of-the art results on the SiSEC DSD100 dataset. For the recurrent network, we use data augmentation during training and show that even simple separation networks are prone to overfitting if no data augmentation is used. Furthermore, we propose a blending of both neural network systems where we linearly combine their raw outputs and then perform a multi-channel Wiener filter post-processing. This blending scheme yields the best results that have been reported to-date on the SiSEC DSD100 dataset.

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 S. Uhlich,
M. Porcu,
F. Giron,
M. Enenkl,
T. Kemp(Sony Europe Limited),
N. Takahashi,
Y. Mitsufuji(Sony Corporation)

This paper deals with the separation of music into individual instrument tracks which is known to be a challenging problem. We describe two different deep neural network architectures for this task, a feed-forward and a recurrent one, and show that each of them yields themselves state-of-the art results on the SiSEC DSD100 dataset. For the recurrent network, we use data augmentation during training and show that even simple separation networks are prone to overfitting if no data augmentation is used. Furthermore, we propose a blending of both neural network systems where we linearly combine their raw outputs and then perform a multi-channel Wiener filter post-processing. This blending scheme yields the best results that have been reported to-date on the SiSEC DSD100 dataset.

[学会名]IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP)
[論文タイトル]Supervised monaural source separation based on autoencoders

In this paper, we propose a new supervised monaural source separation based on autoencoders. We employ the autoencoder for the dictionary training such that the nonlinear network can encode the target source with high expressiveness. The dictionary is trained by each target source without the mixture signal, which makes the system independent from the context where the dictionaries will be used. In separation process, the decoder portions of the trained autoencoders are used as dictionaries to find the activations in a iterative manner such that a summation of the decoder outputs approximates the original mixture. The results of the instruments source separation experiments revealed that the separation performance of the proposed method was superior to that of the NMF.

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 K. Osako,
Y. Mitsufuji(Sony Corporation),
R. Singh,
B. Raj(Sony(China)Limited)

In this paper, we propose a new supervised monaural source separation based on autoencoders. We employ the autoencoder for the dictionary training such that the nonlinear network can encode the target source with high expressiveness. The dictionary is trained by each target source without the mixture signal, which makes the system independent from the context where the dictionaries will be used. In separation process, the decoder portions of the trained autoencoders are used as dictionaries to find the activations in a iterative manner such that a summation of the decoder outputs approximates the original mixture. The results of the instruments source separation experiments revealed that the separation performance of the proposed method was superior to that of the NMF.

[学会名]IEEE Workshop on Applications of Signal Processing to Audio and Acoustics(WASPAA)
[論文タイトル]Multi-Scale Multi-Band DenseNets for Audio Source Separation

This paper deals with the problem of audio source separation. To handle the complex and ill-posed nature of the problems of audio source separation, the current state-of-the-art approaches employ deep neural networks to obtain instrumental spectra from a mixture. In this study, we propose a novel network architecture that extends the recently developed densely connected convolutional network (DenseNet), which has shown excellent results on image classification tasks. To deal with the specific problem of audio source separation, an up-sampling layer, block skip connection and band-dedicated dense blocks are incorporated on top of DenseNet. The proposed approach takes advantage of long contextual information and outperforms state-of-the-art results on SiSEC 2016 competition by a large margin in terms of signal-to-distortion ratio. Moreover, the proposed architecture requires significantly fewer parameters and considerably less training time compared with other methods.

詳細を見る
技術カテゴリ Audio & Acoustics
氏名 N. Takahashi,
Y. Mitsufuji(Sony Corporation)

This paper deals with the problem of audio source separation. To handle the complex and ill-posed nature of the problems of audio source separation, the current state-of-the-art approaches employ deep neural networks to obtain instrumental spectra from a mixture. In this study, we propose a novel network architecture that extends the recently developed densely connected convolutional network (DenseNet), which has shown excellent results on image classification tasks. To deal with the specific problem of audio source separation, an up-sampling layer, block skip connection and band-dedicated dense blocks are incorporated on top of DenseNet. The proposed approach takes advantage of long contextual information and outperforms state-of-the-art results on SiSEC 2016 competition by a large margin in terms of signal-to-distortion ratio. Moreover, the proposed architecture requires significantly fewer parameters and considerably less training time compared with other methods.

[学会名]International Speech Communication Association(Interspeech)
[論文タイトル]Hierarchical Recurrent Neural Network for Story Segmentation

A broadcast news stream consists of a number of stories and
each story consists of several sentences. We capture this structure
using a hierarchical model based on a word-level Recurrent
Neural Network (RNN) sentence modeling layer and a
sentence-level bidirectional Long Short-Term Memory (LSTM)
topic modeling layer. First, the word-level RNN layer extracts a
vector embedding the sentence information from the given transcribed
lexical tokens of each sentence. These sentence embedding
vectors are fed into a bidirectional LSTM that models the
sentence and topic transitions. A topic posterior for each sentence
is estimated discriminatively and a Hidden Markov model
(HMM) follows to decode the story sequence and identify story
boundaries. Experiments on the topic detection and tracking
(TDT2) task indicate that the hierarchical RNN topic modeling
achieves the best story segmentation performance with a higher
F1-measure compared to conventional state-of-the-art methods.
We also compare variations of our model to infer the optimal
structure for the story segmentation task.
Index Terms: spoken language processing, recurrent neural
network, topic modeling, story segmentation

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 E. Tsunoo(The University of Edinburgh/Sony Corporation),  
O. Klejch,
P. Bell,
S. Renals(The University of Edinburgh)

A broadcast news stream consists of a number of stories and
each story consists of several sentences. We capture this structure
using a hierarchical model based on a word-level Recurrent
Neural Network (RNN) sentence modeling layer and a
sentence-level bidirectional Long Short-Term Memory (LSTM)
topic modeling layer. First, the word-level RNN layer extracts a
vector embedding the sentence information from the given transcribed
lexical tokens of each sentence. These sentence embedding
vectors are fed into a bidirectional LSTM that models the
sentence and topic transitions. A topic posterior for each sentence
is estimated discriminatively and a Hidden Markov model
(HMM) follows to decode the story sequence and identify story
boundaries. Experiments on the topic detection and tracking
(TDT2) task indicate that the hierarchical RNN topic modeling
achieves the best story segmentation performance with a higher
F1-measure compared to conventional state-of-the-art methods.
We also compare variations of our model to infer the optimal
structure for the story segmentation task.
Index Terms: spoken language processing, recurrent neural
network, topic modeling, story segmentation

[学会名]IEEE Automatic Speech Recognition and Understanding(ASRU)
[論文タイトル]Hierarchical recurrent neural network for story segmentation using fusion of lexical and acoustic features

A broadcast news stream consists of a number of stories and it is an important task to find the boundaries of stories automatically in news analysis. We capture the topic structure using a hierarchical model based on a Recurrent Neural Network (RNN) sentence modeling layer and a bidirectional Long Short-Term Memory (LSTM) topic modeling layer, with a fusion of acoustic and lexical features. Both features are accumulated with RNNs and trained jointly within the model to be fused at the sentence level. We conduct experiments on the topic detection and tracking (TDT4) task comparing combinations of two modalities trained with limited amount of parallel data. Further we utilize additional sufficient text data for training to polish our model. Experimental results indicate that the hierarchical RNN topic modeling takes advantage of the fusion scheme, especially with additional text training data, with a higher F1-measure compared to conventional state-of-the-art methods.

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 E. Tsunoo(The University of Edinburgh/Sony Corporation),  
O. Klejch,
P. Bell,
S. Renals(The University of Edinburgh)

A broadcast news stream consists of a number of stories and it is an important task to find the boundaries of stories automatically in news analysis. We capture the topic structure using a hierarchical model based on a Recurrent Neural Network (RNN) sentence modeling layer and a bidirectional Long Short-Term Memory (LSTM) topic modeling layer, with a fusion of acoustic and lexical features. Both features are accumulated with RNNs and trained jointly within the model to be fused at the sentence level. We conduct experiments on the topic detection and tracking (TDT4) task comparing combinations of two modalities trained with limited amount of parallel data. Further we utilize additional sufficient text data for training to polish our model. Experimental results indicate that the hierarchical RNN topic modeling takes advantage of the fusion scheme, especially with additional text training data, with a higher F1-measure compared to conventional state-of-the-art methods.

[学会名]IEEE International Conference on Communications(ICC)
[論文タイトル]MocLis: A Moving Cell Support Protocol Based on Locator/ID Separation for 5G System

In the LTE/LTE-Advanced (LTE-A) system, user-plane for a user equipment (UE) is provided by tunneling which causes header overhead, processing overhead, and management overhead. In addition, the LTE-A system does not support moving cells which are composed of a mobile Relay Node (RN) and UEs attached to the mobile RN. There are several proposals for moving cells in the LTE-A system and the 5G system, however, all of them rely on tunneling for the user-plane, which means that of them avoid the tunneling overheads. This paper proposes MocLis, a moving cell support protocol based on a Locator/ID split approach. MocLis does not use tunneling. Nested moving cells are supported. Signaling cost for handover of a moving cell is independent of the number of UEs and nested RNs in the moving cell. MocLis is implemented in Linux; user space daemons and modified kernel. The measurement results show that the attachment time and handover time are short enough for practical use. TCP throughput in MocLis is faster than that in the tunneling based approaches.

詳細を見る
技術カテゴリ Communication
氏名 T. Ochiai,
K. Matsueda,
F. Teraoka (Keio University, Japan)
H. Takano,
R. Kimura,
R. Sawai(Sony Corporation)

In the LTE/LTE-Advanced (LTE-A) system, user-plane for a user equipment (UE) is provided by tunneling which causes header overhead, processing overhead, and management overhead. In addition, the LTE-A system does not support moving cells which are composed of a mobile Relay Node (RN) and UEs attached to the mobile RN. There are several proposals for moving cells in the LTE-A system and the 5G system, however, all of them rely on tunneling for the user-plane, which means that of them avoid the tunneling overheads. This paper proposes MocLis, a moving cell support protocol based on a Locator/ID split approach. MocLis does not use tunneling. Nested moving cells are supported. Signaling cost for handover of a moving cell is independent of the number of UEs and nested RNs in the moving cell. MocLis is implemented in Linux; user space daemons and modified kernel. The measurement results show that the attachment time and handover time are short enough for practical use. TCP throughput in MocLis is faster than that in the tunneling based approaches.

[学会名]IEEE International Workshop on Signal Processing Advances in Wireless Communications(SPAWC)
[論文タイトル]Non-Line-of-Sight Positioning for Mmwave Communications

Using information about the wireless communication channel is a well known approach to estimate a users position. So far it has been shown that such methods can provide positioning information in line-of-sight (LOS) situations by estimating channel properties like time of flight, direction of arrival, and direction of departure of a link between a single access point and station. In this paper we focus on mm Wave channels and propose a method that allows positioning in indoor scenarios even under non-line-of-sight conditions by exploiting the presence of scatterers. Further, we propose an approach to overcome the need for an angular reference which is usually required to perform measurements of direction of arrival/departure and, therefore, limits practical applications. We investigate the influence of noisy temporal and spatial measurements on achievable performance with and without presence of an angular reference. Results show that in presence of an angular reference, positioning with the proposed method is possible with an accuracy lower than 4 cm in 50 % of observations and decreases to 8 cm without an angular reference.

詳細を見る
技術カテゴリ Network & Data Analytics
氏名 F. Fellhauer(University of Stuttgart-Sony EuTEC Contractor),
N. Loghin (EuTEC),
J. Lassen,
A. Jaber (University of Stuttgart, Students)

Using information about the wireless communication channel is a well known approach to estimate a users position. So far it has been shown that such methods can provide positioning information in line-of-sight (LOS) situations by estimating channel properties like time of flight, direction of arrival, and direction of departure of a link between a single access point and station. In this paper we focus on mm Wave channels and propose a method that allows positioning in indoor scenarios even under non-line-of-sight conditions by exploiting the presence of scatterers. Further, we propose an approach to overcome the need for an angular reference which is usually required to perform measurements of direction of arrival/departure and, therefore, limits practical applications. We investigate the influence of noisy temporal and spatial measurements on achievable performance with and without presence of an angular reference. Results show that in presence of an angular reference, positioning with the proposed method is possible with an accuracy lower than 4 cm in 50 % of observations and decreases to 8 cm without an angular reference.

[学会名]第23回ロボティクスシンポジア
[論文タイトル]一般化逆動力学とロバスト制御による6 自由度精密バイラテラル制御システムの開発

This paper proposes a control method to realize a precise bilateral master slave
manipulation in the small scale world. The concept is based on the integration of the computed
torque control method and the robust control. As a former framework, Generalized Inverse
Dynamics (GID) that can cope with various operational spaces and mechanical systems is
introduced, where the bilateral control is formulated from the viewpoint of operational space
control. As a latter one, disturbance observer (DOB) is introduced. DOB is applied in the
operational space in the acceleration dimension, where GID is used as its nominal model.
The two-armed 6-DOF bilateral master slave system with rigid and low-inertial mechanism is
prototyped and the proposed controller is applied to it. The proposed method made it possible
to enlarge the position and the force by 10 times stably while reducing the intervening inertia
to less than 10% of the actual inertia.

詳細を見る
技術カテゴリ Robotics
氏名 K. Nagasaka(Sony Corporation)

This paper proposes a control method to realize a precise bilateral master slave
manipulation in the small scale world. The concept is based on the integration of the computed
torque control method and the robust control. As a former framework, Generalized Inverse
Dynamics (GID) that can cope with various operational spaces and mechanical systems is
introduced, where the bilateral control is formulated from the viewpoint of operational space
control. As a latter one, disturbance observer (DOB) is introduced. DOB is applied in the
operational space in the acceleration dimension, where GID is used as its nominal model.
The two-armed 6-DOF bilateral master slave system with rigid and low-inertial mechanism is
prototyped and the proposed controller is applied to it. The proposed method made it possible
to enlarge the position and the force by 10 times stably while reducing the intervening inertia
to less than 10% of the actual inertia.

[学会名]IEEE International Conference on Acoustics, Speech and Signal Processing(ICASSP)
[論文タイトル]Mode Domain Spatial Active Noise Control Using Sparse Signal Representation

Active noise control (ANC) over a sizeable space requires a large number of reference and error microphones to satisfy the spatial Nyquist sampling criterion, which limits the feasibility of practical realization of such systems. This paper proposes a mode-domain feedforward ANC method to attenuate the noise field over a large space while reducing the number of microphones required. We adopt a sparse reference signal representation to precisely calculate the reference mode coefficients. The proposed system consists of circular reference and error microphone arrays, which capture the reference noise signal and residual error signal, respectively, and a circular loudspeaker array to drive the anti-noise signal. Experimental results indicate that above the spatial Nyquist frequency,our proposed method can perform well compared to a conventional methods. Moreover, the proposed method can even reduce the number of reference microphones while achieving better noise attenuation.

詳細を見る
技術カテゴリ Audio & Acoustics
氏名 Y. Maeno,
Y. Mitsufuji,
T. D. Abhayapa(ANU)

Active noise control (ANC) over a sizeable space requires a large number of reference and error microphones to satisfy the spatial Nyquist sampling criterion, which limits the feasibility of practical realization of such systems. This paper proposes a mode-domain feedforward ANC method to attenuate the noise field over a large space while reducing the number of microphones required. We adopt a sparse reference signal representation to precisely calculate the reference mode coefficients. The proposed system consists of circular reference and error microphone arrays, which capture the reference noise signal and residual error signal, respectively, and a circular loudspeaker array to drive the anti-noise signal. Experimental results indicate that above the spatial Nyquist frequency,our proposed method can perform well compared to a conventional methods. Moreover, the proposed method can even reduce the number of reference microphones while achieving better noise attenuation.

[学会名]IEEE International Workshop on Acoustic Signal Enhancement (IWAENC)
[論文タイトル]MMDenseLSTM: An Efficient Combination of Convolutional and Recurrent Neural Networks for Audio Source Separation

Deep neural networks have become an indispensable technique for audio source separation (ASS). It was recently reported that a variant of CNN architecture called MMDenseNet was successfully employed to solve the ASS problem of estimating source amplitudes, and state-of-the-art results were obtained for DSD100 dataset. To further enhance MMDenseNet, here we propose a novel architecture that integrates long short-term memory (LSTM) in multiple scales with skip connections to efficiently model long-term structures within an audio context. The experimental results show that the proposed method outperforms MMDenseNet, LSTM and a blend of the two networks. The number of parameters and processing time of the proposed model are significantly less than those for simple blending. Furthermore, the proposed method yields better results than those obtained using ideal binary masks for a singing voice separation task.

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 N. Takahashi,
N. Goswami,
Y. Mitsufuji(Sony Corporation)

Deep neural networks have become an indispensable technique for audio source separation (ASS). It was recently reported that a variant of CNN architecture called MMDenseNet was successfully employed to solve the ASS problem of estimating source amplitudes, and state-of-the-art results were obtained for DSD100 dataset. To further enhance MMDenseNet, here we propose a novel architecture that integrates long short-term memory (LSTM) in multiple scales with skip connections to efficiently model long-term structures within an audio context. The experimental results show that the proposed method outperforms MMDenseNet, LSTM and a blend of the two networks. The number of parameters and processing time of the proposed model are significantly less than those for simple blending. Furthermore, the proposed method yields better results than those obtained using ideal binary masks for a singing voice separation task.

[学会名]The 2018 Joint Workshop on Machine Learning for Music
[論文タイトル]Improving DNN-based Music Source Separation using Phase Features

Music source separation with deep neural networks typically relies only on amplitude features. In this paper we show that additional phase features can improve the separation performance. Using the theoretical relationship between STFT phase and amplitude, we conjecture that derivatives of the phase are a good feature representation opposed to the raw phase. We verify this conjecture experimentally and propose a new DNN architecture which combines amplitude and phase. This joint approach achieves a better signal-to distortion ratio on the DSD100 dataset for all instruments compared to a network that uses only amplitude features. Especially, the bass instrument benefits from the phase information.

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 J. Muth(EPFL),
S. Uhlich,
F. Cardinaux,
Y. Mitsufuji(Sony Corporation)

Music source separation with deep neural networks typically relies only on amplitude features. In this paper we show that additional phase features can improve the separation performance. Using the theoretical relationship between STFT phase and amplitude, we conjecture that derivatives of the phase are a good feature representation opposed to the raw phase. We verify this conjecture experimentally and propose a new DNN architecture which combines amplitude and phase. This joint approach achieves a better signal-to distortion ratio on the DSD100 dataset for all instruments compared to a network that uses only amplitude features. Especially, the bass instrument benefits from the phase information.

[学会名]AES Conference on Spatial Reproduction -Aesthetic and Science-
[論文タイトル]Creating a Highly-Realistic "Acoustic Vessel Odyssey" Using Sound Field Synthesis with 576 Loudspeakers

“Acoustic Vessel Odyssey” is a sound installation realizing the future of music by using Sony’s spatial audio technology called Sound Field Synthesis (SFS). It enables creators to simulate popping, moving and partitioning of sounds in one space. At the “Lost In Music” event, where we demonstrated “Acoustic Vessel Odyssey”, the immersive experience provided by SFS technology was further enhanced by a new, specially designed loudspeaker array consisting of 576 loudspeakers. The content was choreographed by sound artist Evala and is accompanied by a light installation created by digital media artists Kimchi and Chips. In this paper, we present the details of the system architecture as well as technical requirements of “Acoustic Vessel Odyssey”.

詳細を見る
技術カテゴリ Audio & Acoustics
氏名 Y. Mitsufuji,
A. Tomura,
K. Ohkuri(Sony Corporation)

“Acoustic Vessel Odyssey” is a sound installation realizing the future of music by using Sony’s spatial audio technology called Sound Field Synthesis (SFS). It enables creators to simulate popping, moving and partitioning of sounds in one space. At the “Lost In Music” event, where we demonstrated “Acoustic Vessel Odyssey”, the immersive experience provided by SFS technology was further enhanced by a new, specially designed loudspeaker array consisting of 576 loudspeakers. The content was choreographed by sound artist Evala and is accompanied by a light installation created by digital media artists Kimchi and Chips. In this paper, we present the details of the system architecture as well as technical requirements of “Acoustic Vessel Odyssey”.

[学会名]International Speech Communication Association(Interspeech)
[論文タイトル]PhaseNet: Discretized Phase Modeling with Deep Neural Networks for Speech Enhancement and Audio Source Separation

Previous research on audio source separation based on deep
neural networks (DNNs) mainly focuses on estimating the magnitude
spectrum of target sources and typically, phase of the
mixture signal is combined with the estimated magnitude spectra
in an ad-hoc way. Although recovering target phase is assumed
to be important for the improvement of separation quality,
it can be difficult to handle the periodic nature of the phase
with the regression approach. Unwrapping phase is one way
to eliminate the phase discontinuity, however, it increases the
range of value along with the times of unwrapping, making it
difficult for DNNs to model. To overcome this difficulty, we
propose to treat the phase estimation problem as a classification
problem by discretizing phase values and assigning class indices
to them. Experimental results show that our classificationbased
approach 1) successfully recovers the phase of the target
source in the discretized domain, 2) improves signal-todistortion
ratio (SDR) over the regression-based approach in
both speech enhancement task and music source separation
(MSS) task, and 3) outperforms state-of-the-art MSS.
Index Terms: phase modeling, quantized phase, deep neural
networks

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 N. Takahashi,
P. Agrawal(IISc),
N. Goswami,
Y. Mitsufuji(Sony Corporation)

Previous research on audio source separation based on deep
neural networks (DNNs) mainly focuses on estimating the magnitude
spectrum of target sources and typically, phase of the
mixture signal is combined with the estimated magnitude spectra
in an ad-hoc way. Although recovering target phase is assumed
to be important for the improvement of separation quality,
it can be difficult to handle the periodic nature of the phase
with the regression approach. Unwrapping phase is one way
to eliminate the phase discontinuity, however, it increases the
range of value along with the times of unwrapping, making it
difficult for DNNs to model. To overcome this difficulty, we
propose to treat the phase estimation problem as a classification
problem by discretizing phase values and assigning class indices
to them. Experimental results show that our classificationbased
approach 1) successfully recovers the phase of the target
source in the discretized domain, 2) improves signal-todistortion
ratio (SDR) over the regression-based approach in
both speech enhancement task and music source separation
(MSS) task, and 3) outperforms state-of-the-art MSS.
Index Terms: phase modeling, quantized phase, deep neural
networks

[学会名]IEEE International Workshop on Acoustic Signal Enhancement(IWAENC)
[論文タイトル]Mode-Domain Spatial Active Noise Control Using Multiple Circular Arrays

Noise control and attenuation over a sizable space requires uniformly distributed microphones and loudspeakers, which limits the system’s viability in practice. In this paper, we propose a mode-domain active noise control (ANC) system using a simple microphone and loudspeaker array structure. We introduce few circular microphone and loudspeaker arrays to first transform a sound field into circular expansion mode coefficients and then combine them to calculate 3D mode coefficients, which are then processed in an adaptive algorithm to attenuate an undesired noise field in 3D space. Experimental results indicate that our proposed method gives comparable noise attenuation performance compared to a conventional method, which uses an unfeasible array structure. Furthermore, the proposed method shows better noise attenuation performance than a conventional temporal frequency domain ANC system.

詳細を見る
技術カテゴリ Audio & Acoustics
氏名 Y. Maeno,
Y. Mitsufuji(Sony Corporation),
P. N. Samarasinghe,
T. D. Abhayapala(ANU)

Noise control and attenuation over a sizable space requires uniformly distributed microphones and loudspeakers, which limits the system’s viability in practice. In this paper, we propose a mode-domain active noise control (ANC) system using a simple microphone and loudspeaker array structure. We introduce few circular microphone and loudspeaker arrays to first transform a sound field into circular expansion mode coefficients and then combine them to calculate 3D mode coefficients, which are then processed in an adaptive algorithm to attenuate an undesired noise field in 3D space. Experimental results indicate that our proposed method gives comparable noise attenuation performance compared to a conventional method, which uses an unfeasible array structure. Furthermore, the proposed method shows better noise attenuation performance than a conventional temporal frequency domain ANC system.

[学会名]Audio Engineering Society International convention(AES)
[論文タイトル]Microphone Array Geometry for Two Dimensional Broadband Sound Field Recording

Sound field recording with arrays made of omnidirectional microphones suffers from an ill-conditioned problem due to the zero and small values of the spherical Bessel function. This article proposes a geometric design of a microphone array for broadband two dimensional (2D) sound field recording and reproduction. The design is parametric, with a layout having a discrete rotationally symmetric geometry composed of several geometrically similar subarrays. The actual parameters of the proposed layout can be determined for various acoustic situations to give optimized results. This design has the advantage that it simultaneously satisfies many important requirements of microphone arrays such as error robustness, operating bandwidth, and microphone unit efficiency.

詳細を見る
技術カテゴリ Audio & Acoustics
氏名 W. Liao,
Y. Mitsufuji,
K. Osako,
K. Ohkuri(Sony Corporation)

Sound field recording with arrays made of omnidirectional microphones suffers from an ill-conditioned problem due to the zero and small values of the spherical Bessel function. This article proposes a geometric design of a microphone array for broadband two dimensional (2D) sound field recording and reproduction. The design is parametric, with a layout having a discrete rotationally symmetric geometry composed of several geometrically similar subarrays. The actual parameters of the proposed layout can be determined for various acoustic situations to give optimized results. This design has the advantage that it simultaneously satisfies many important requirements of microphone arrays such as error robustness, operating bandwidth, and microphone unit efficiency.

[学会名]IEEE Spoken Language Technology(SLT)
[論文タイトル]Context-Aware Dialog Re-Ranking for Task-Oriented Dialog Systems

Dialog response ranking is used to rank response candidates by considering their relation to the dialog history. Although researchers have addressed this concept for open-domain dialogs, little attention has been focused on task-oriented dialogs. Furthermore, no previous studies have analyzed whether response ranking can improve the performance of existing dialog systems in real human-computer dialogs with speech recognition errors. In this paper, we propose a context-aware dialog response re-ranking system. Our system reranks responses in two steps: (1) it calculates matching scores for each candidate response and the current dialog context; (2) it combines the matching scores and a probability distribution of the candidates from an existing dialog system for response re-ranking. By using neural word embedding-based models and handcrafted or logistic regression-based ensemble models, we have improved the performance of a recently proposed end-to-end task-oriented dialog system on real dialogs with speech recognition errors.

詳細を見る
技術カテゴリ AI & Machine Learning
氏名 J. Ohmura(Sony Corporation),
M. Eskenazi(Carnegie Mellon University)

Dialog response ranking is used to rank response candidates by considering their relation to the dialog history. Although researchers have addressed this concept for open-domain dialogs, little attention has been focused on task-oriented dialogs. Furthermore, no previous studies have analyzed whether response ranking can improve the performance of existing dialog systems in real human-computer dialogs with speech recognition errors. In this paper, we propose a context-aware dialog response re-ranking system. Our system reranks responses in two steps: (1) it calculates matching scores for each candidate response and the current dialog context; (2) it combines the matching scores and a probability distribution of the candidates from an existing dialog system for response re-ranking. By using neural word embedding-based models and handcrafted or logistic regression-based ensemble models, we have improved the performance of a recently proposed end-to-end task-oriented dialog system on real dialogs with speech recognition errors.

[学会名]The Society for Information Display(SID)
[論文タイトル]High-Brightness Solid-State Light Source for 4K Ultra-Short-Throw Projector

We have developed technologies for a high‐output light source consisting of blue laser diodes and a reflective phosphor wheel for next generation 4K Ultra‐Short‐Throw Projector, and have achieved a fluorescence output of 87 W. As far as we know, it is the highest fluorescence output for projectors. We adopted a newly developed phosphor cooling mechanism and an inorganic binder for high reliability of the phosphor wheel. Therefore, no deterioration in the phosphor wheel could be observed over a time period of 7,500 hours. In this paper, we report on these lightsource technologies for achieving high output and high reliability.

詳細を見る
技術カテゴリ Device & Material
氏名 Y. Maeda(Sony Semiconductor Solutions Corporation)

We have developed technologies for a high‐output light source consisting of blue laser diodes and a reflective phosphor wheel for next generation 4K Ultra‐Short‐Throw Projector, and have achieved a fluorescence output of 87 W. As far as we know, it is the highest fluorescence output for projectors. We adopted a newly developed phosphor cooling mechanism and an inorganic binder for high reliability of the phosphor wheel. Therefore, no deterioration in the phosphor wheel could be observed over a time period of 7,500 hours. In this paper, we report on these lightsource technologies for achieving high output and high reliability.

[学会名]The Society for Information Display(SID)
[論文タイトル]High-Luminance Monochromatic See-Through Eyewear Display with Volume Hologram

詳細を見る
技術カテゴリ Display & Visual
氏名 T. Oku(Sony Semiconductor Solutions Corporation)

[学会名]The Society for Information Display(SID)
[論文タイトル]Improvement of Light-Extraction Efficiency of a Laser-Phosphor Light Source

We investigated the laser‐phosphor light source by using inorganic phosphor wheel. We experimentally confirmed the light extraction efficiency of the inorganic phosphor wheel which is 8% higher than conventional phosphor wheel. In addition, we explain about the cause of improvement of the efficiency by showing fluorescence emission model.

詳細を見る
技術カテゴリ Device & Material
氏名 H. Morita(Sony Semiconductor Solutions Corporation)

We investigated the laser‐phosphor light source by using inorganic phosphor wheel. We experimentally confirmed the light extraction efficiency of the inorganic phosphor wheel which is 8% higher than conventional phosphor wheel. In addition, we explain about the cause of improvement of the efficiency by showing fluorescence emission model.

[学会名]The Society for Information Display(SID)
[論文タイトル]Distinguished Paper: New Pixel-Driving Circuit Using Self-Discharging Compensation Method for High-Resolution OLED Microdisplays on a Silicon Backplane

A new 4T2C pixel circuit formed on a silicon substrate is proposed to realize a high‐resolution 7.8‐μm pixel pitch AMOLED microdisplay. In order to achieve high luminance uniformity, the pixel circuit compensates its Vth variation of the MOSFET for the driving transistor internally by using self‐discharging method. Also presented are 0.5‐in Quad‐VGA and 1.25‐in wide Quad‐XGA microdisplays with the proposed pixel circuit.

詳細を見る
技術カテゴリ Display & Visual
氏名 K. Kimura(Sony Semiconductor Solutions Corporation)

A new 4T2C pixel circuit formed on a silicon substrate is proposed to realize a high‐resolution 7.8‐μm pixel pitch AMOLED microdisplay. In order to achieve high luminance uniformity, the pixel circuit compensates its Vth variation of the MOSFET for the driving transistor internally by using self‐discharging method. Also presented are 0.5‐in Quad‐VGA and 1.25‐in wide Quad‐XGA microdisplays with the proposed pixel circuit.

[学会名]The Society for Information Display(SID)
[論文タイトル]Distinguished Paper: 4032-ppi High-Resolution OLED Microdisplay

A 0.5 inch UXGA OLED microdisplay has been developed with 6.3μm pixel pitch. Not only 4032ppi high resolution, but high frame rate, low power consumption, wide viewing angle and high luminance have been achieved. This newly developed OLED microdisplay is suitable for Near‐to‐Eye display applications, especially electronic viewfinders.

詳細を見る
技術カテゴリ Device & Material
氏名 T. Fujii(Sony Semiconductor Solutions Corporation)

A 0.5 inch UXGA OLED microdisplay has been developed with 6.3μm pixel pitch. Not only 4032ppi high resolution, but high frame rate, low power consumption, wide viewing angle and high luminance have been achieved. This newly developed OLED microdisplay is suitable for Near‐to‐Eye display applications, especially electronic viewfinders.

[学会名]IEEE International Electron Devices Meeting(IEDM)
[論文タイトル]Four-Directional Pixel-Wise Polarization CMOS Image Sensor Using Air-Gap Wire Grid on 2.5-μm Back-Illuminated Pixels

Polarization information is useful in highly functional imaging. This paper presents a four-directional pixel-wise polarization CMOS image sensor using an air-gap wire grid on 2.5-μm back-illuminated pixels. The fabricated air-gap wire grid polarizer achieved a transmittance of 63.3 % and an extinction ratio of 85 at 550 nm, outperforming conventional polarization sensors. The pixel-wise polarizers fabricated with the wafer process on back-illuminated image sensors exhibit good oblique-incidence characteristics, even with small polarization pixels of 2.5 μm. The proposed image sensor realizes mega-pixel various fusion-imaging applications, such as surface reflection reduction, highly accurate depth mapping, and condition-robust surveillance.

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 T. Yamazaki(Sony Semiconductor Solutions Corporation)

Polarization information is useful in highly functional imaging. This paper presents a four-directional pixel-wise polarization CMOS image sensor using an air-gap wire grid on 2.5-μm back-illuminated pixels. The fabricated air-gap wire grid polarizer achieved a transmittance of 63.3 % and an extinction ratio of 85 at 550 nm, outperforming conventional polarization sensors. The pixel-wise polarizers fabricated with the wafer process on back-illuminated image sensors exhibit good oblique-incidence characteristics, even with small polarization pixels of 2.5 μm. The proposed image sensor realizes mega-pixel various fusion-imaging applications, such as surface reflection reduction, highly accurate depth mapping, and condition-robust surveillance.

[学会名]IEEE International Electron Devices Meeting(IEDM)
[論文タイトル]Novel Stacked CMOS Image Sensor with Advanced Cu2Cu Hybrid Bonding

We have successfully mass-produced novel stacked back-illuminated CMOS image sensors (BI-CIS). In the new CIS, we introduced advanced Cu2Cu hybrid bonding that we had developed. The electrical test results showed that our highly robust Cu2Cu hybrid bonding achieved remarkable connectivity and reliability. The performance of image sensor was also investigated and our novel stacked BI-CIS showed favorable results.

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 Y. Kagawa(Sony Semiconductor Solutions Corporation)

We have successfully mass-produced novel stacked back-illuminated CMOS image sensors (BI-CIS). In the new CIS, we introduced advanced Cu2Cu hybrid bonding that we had developed. The electrical test results showed that our highly robust Cu2Cu hybrid bonding achieved remarkable connectivity and reliability. The performance of image sensor was also investigated and our novel stacked BI-CIS showed favorable results.

[学会名]IEEE International Electron Devices Meeting(IEDM)
[論文タイトル]Near-infrared Sensitivity Enhancement of a Back-illuminated Complementary Metal Oxide Semiconductor Image Sensor with a Pyramid Surface for Diffraction Structure

We demonstrated the near-infrared (NIR) sensitivity enhancement of back-illuminated complementary metal oxide semiconductor image sensors (BI-CIS) with a pyramid surface for diffraction (PSD) structures on crystalline silicon and deep trench isolation (DTI). The incident light diffracted on the PSD because of the strong diffraction within the substrate, resulting in a quantum efficiency of more than 30% at 850 nm. By using a special treatment process and DTI structures, without increasing the dark current, the amount of crosstalk to adjacent pixels was decreased, providing resolution equal to that of a flat structure. Testing of the prototype devices revealed that we succeeded in developing unique BI-CIS with high NIR sensitivity.

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 I. Oshiyama(Sony Semiconductor Solutions Corporation)

We demonstrated the near-infrared (NIR) sensitivity enhancement of back-illuminated complementary metal oxide semiconductor image sensors (BI-CIS) with a pyramid surface for diffraction (PSD) structures on crystalline silicon and deep trench isolation (DTI). The incident light diffracted on the PSD because of the strong diffraction within the substrate, resulting in a quantum efficiency of more than 30% at 850 nm. By using a special treatment process and DTI structures, without increasing the dark current, the amount of crosstalk to adjacent pixels was decreased, providing resolution equal to that of a flat structure. Testing of the prototype devices revealed that we succeeded in developing unique BI-CIS with high NIR sensitivity.

[学会名]IEEE International Electron Devices Meeting(IEDM)
[論文タイトル]An Experimental CMOS Photon Detector with 0.5e- RMS Temporal Noise and 15μm pitch Active Sensor Pixels

This is the first reported non-electron-multiplying CMOS Image Sensor (CIS) photon-detector for replacing Photo Multiplier Tubes (PMT). 15jum pitch active sensor pixels with complete charge transfer and readout noise of 0.5 e-RMS are arrayed and their digital outputs are summed to detect micro light pulses. Successful proof of radiation counting is demonstrated.

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 T. Nishihara(Sony Semiconductor Solutions Corporation)

This is the first reported non-electron-multiplying CMOS Image Sensor (CIS) photon-detector for replacing Photo Multiplier Tubes (PMT). 15jum pitch active sensor pixels with complete charge transfer and readout noise of 0.5 e-RMS are arrayed and their digital outputs are summed to detect micro light pulses. Successful proof of radiation counting is demonstrated.

[学会名]IEEE International Electron Devices Meeting(IEDM)
[論文タイトル]Pixel/DRAM/logic 3-layer stacked CMOS image sensor technology

We developed a CMOS image sensor (CIS) chip, which is stacked pixel/DRAM/logic. In this CIS chip, three Si substrates are bonded together, and each substrate is electrically connected by two-stacked through-silica vias (TSVs) through the CIS or dynamic random access memory (DRAM). We obtained low resistance, low leakage current, and high reliability characteristics of these TSVs. Connecting metal with TSVs through DRAM can be used as low resistance wiring for a power supply. The Si substrate of the DRAM can be thinned to 3 pm, and its memory retention and operation characteristics are sufficient for specifications after thinning. With this stacked CIS chip, it is possible to achieve less rolling shutter distortion and produce super slow motion video.

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 H. Tsugawa(Sony Semiconductor Solutions Corporation)

We developed a CMOS image sensor (CIS) chip, which is stacked pixel/DRAM/logic. In this CIS chip, three Si substrates are bonded together, and each substrate is electrically connected by two-stacked through-silica vias (TSVs) through the CIS or dynamic random access memory (DRAM). We obtained low resistance, low leakage current, and high reliability characteristics of these TSVs. Connecting metal with TSVs through DRAM can be used as low resistance wiring for a power supply. The Si substrate of the DRAM can be thinned to 3 pm, and its memory retention and operation characteristics are sufficient for specifications after thinning. With this stacked CIS chip, it is possible to achieve less rolling shutter distortion and produce super slow motion video.

[学会名]VLSI Symposia on Technology and Circuits(VLSI)
[論文タイトル]An 8.3M‐pixel 480fps Global‐Shutter CMOS Image Sensor with Gain‐Adaptive Column ADCs and 2‐on‐1 Stacked Device Structure

A 4K2K 480 fps global-shutter CMOS image sensor has been developed with super 35 mm format. This sensor employs newly developed gain-adaptive column ADCs to attain a dark random noise of 140 μV rms for the full-scale readout of 923 mV. An on-chip online correction of the error between two switchable gains maintains the nonlinearity of output image within 0.18 %. The 16-channel output interfaces with 4.752 Gbps/ch are implemented in 2 diced logic chips stacked on a sensor chip with 38K micro bumps.

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 Y. Oike(Sony Semiconductor Solutions Corporation)

A 4K2K 480 fps global-shutter CMOS image sensor has been developed with super 35 mm format. This sensor employs newly developed gain-adaptive column ADCs to attain a dark random noise of 140 μV rms for the full-scale readout of 923 mV. An on-chip online correction of the error between two switchable gains maintains the nonlinearity of output image within 0.18 %. The 16-channel output interfaces with 4.752 Gbps/ch are implemented in 2 diced logic chips stacked on a sensor chip with 38K micro bumps.

[学会名]VLSI Symposia on Technology and Circuits(VLSI)
[論文タイトル]Accelerating the Sensing World through Imaging Evolution

The evolution of CMOS image sensors (CIS) and the future prospect of a “Sensing” world utilizing advanced imaging technologies promise to improve our quality of life by sensing everything, everywhere, every time. Charge Coupled Device image sensors replaced video camera tubes, allowing the introduction of compact video cameras as consumer products. CIS now dominates the market for digital still cameras created by its predecessor and, with the advent of column-parallel ADCs and back-illuminated technologies, outperforms them. CIS’s achieve better signal to noise ratio, lower power consumption, and higher frame rate. Stacked CIS’s continue to enhance functionality and user experience in mobile devices, a market that currently comprises over several billion units per year. CIS imaging technologies promise to accelerate the progress of a sensing world by continuously improving sensitivity, extending detectable wave-lengths, and further improving depth resolution and temporal resolution.

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 T. Nomoto(Sony Semiconductor Solutions Corporation)

The evolution of CMOS image sensors (CIS) and the future prospect of a “Sensing” world utilizing advanced imaging technologies promise to improve our quality of life by sensing everything, everywhere, every time. Charge Coupled Device image sensors replaced video camera tubes, allowing the introduction of compact video cameras as consumer products. CIS now dominates the market for digital still cameras created by its predecessor and, with the advent of column-parallel ADCs and back-illuminated technologies, outperforms them. CIS’s achieve better signal to noise ratio, lower power consumption, and higher frame rate. Stacked CIS’s continue to enhance functionality and user experience in mobile devices, a market that currently comprises over several billion units per year. CIS imaging technologies promise to accelerate the progress of a sensing world by continuously improving sensitivity, extending detectable wave-lengths, and further improving depth resolution and temporal resolution.

[学会名]VLSI Symposia on Technology and Circuits(VLSI)
[論文タイトル]320x240 Back-Illuminated 10μm CAPD Pixels for High Speed Modulation Time-of-Flight CMOS Image Sensor

A 320×240 back-illuminated Time-of-Flight CMOS image sensor with 10μm CAPD pixels has been developed. The back-illuminated (BI) pixel structure maximizes the fill factor, allows for flexible transistor position and makes the light path independent of the metal layer. In addition, the CAPD pixel, which is optimized for high speed modulation, results in 80% modulation contrast at 100MHz modulation frequency.

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 Y. Kato(Sony Semiconductor Solutions Corporation)
参考リンク

A 320×240 back-illuminated Time-of-Flight CMOS image sensor with 10μm CAPD pixels has been developed. The back-illuminated (BI) pixel structure maximizes the fill factor, allows for flexible transistor position and makes the light path independent of the metal layer. In addition, the CAPD pixel, which is optimized for high speed modulation, results in 80% modulation contrast at 100MHz modulation frequency.

[学会名]VLSI Symposia on Technology and Circuits(VLSI)
[論文タイトル]224-ke Saturation Signal Global Shutter CMOS Image Sensor with In-Pixel Pinned Storage and Lateral Overflow Integration Capacitor

The required incorporation of an additional in-pixel retention node for global shutter complementary metal-oxide semiconductor (CMOS) image sensors means that achieving a large saturation signal presents a challenge. This paper reports a 3.875-μm pixel single exposure global shutter CMOS image sensor with an in-pixel pinned storage (PST) and a lateral-overflow integration capacitor (LOFIC), which extends the saturation signal to 224 ke, thereby enabling the saturation signal per unit area to reach 14.9 ke/μm. This pixel can assure a large saturation signal by using a LOFIC for accumulation without degrading the image quality under dark and low illuminance conditions owing to the PST.

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 Y. Sakano(Sony Semiconductor Solutions Corporation)

The required incorporation of an additional in-pixel retention node for global shutter complementary metal-oxide semiconductor (CMOS) image sensors means that achieving a large saturation signal presents a challenge. This paper reports a 3.875-μm pixel single exposure global shutter CMOS image sensor with an in-pixel pinned storage (PST) and a lateral-overflow integration capacitor (LOFIC), which extends the saturation signal to 224 ke, thereby enabling the saturation signal per unit area to reach 14.9 ke/μm. This pixel can assure a large saturation signal by using a LOFIC for accumulation without degrading the image quality under dark and low illuminance conditions owing to the PST.

[学会名]VLSI Symposia on Technology and Circuits(VLSI)
[論文タイトル]A 4.1Mpix 280fps Stacked CMOS Image Sensor with Array-Parallel ADC Architecture for Region Control

A 4.1Mpix 280fps stacked CMOS image sensor with array-parallel ADC architecture is developed for region control applications. The combination of an active reset scheme and frame correlated double sampling (CDS) operation cancels Vth variation of pixel amplifier transistors and kTC noise. The sensor utilizes a floating diffusion (FD) based back-illuminated (BI) global shutter (GS) pixel with 4.2e-rms readout noise. An intelligent sensor system with face detection and high resolution region-of-interest (ROI) output is demonstrated with significantly low data bandwidth and low ADC power dissipation by utilizing a flexible area access function.

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 T. Takahashi(Sony Semiconductor Solutions Corporation)

A 4.1Mpix 280fps stacked CMOS image sensor with array-parallel ADC architecture is developed for region control applications. The combination of an active reset scheme and frame correlated double sampling (CDS) operation cancels Vth variation of pixel amplifier transistors and kTC noise. The sensor utilizes a floating diffusion (FD) based back-illuminated (BI) global shutter (GS) pixel with 4.2e-rms readout noise. An intelligent sensor system with face detection and high resolution region-of-interest (ROI) output is demonstrated with significantly low data bandwidth and low ADC power dissipation by utilizing a flexible area access function.

[学会名]VLSI Symposia on Technology and Circuits(VLSI)
[論文タイトル]3D integration technology for CMOS image sensors and future prospects

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 R. Nakamura(Sony Semiconductor Solutions Corporation)

[学会名]International Solid-State Circuits Conference(ISSCC)
[論文タイトル]A 0.7V 1.5-to-2.3mW GNSS Receiver with 2.5-to-3.8dB NF in 28nm FD-SOI

We are approaching the age of IoE, in which wearable devices such as smart watches will be widespread. Sensing processors play a key role and the Global Navigation Satellite System (GNSS) is considered fundamental. Power consumption is one of the most important characteristics for such sensing processors. However, current GNSS receivers consume around 10mW [1,2] and are difficult to be embedded. GNSS receivers require high supply voltage for low-noise RF, which contributes to large power consumption. We developed 0.7V RF circuits that enable effective use of FD-SOI. Among the RF circuits, an LNA and an LPF are the key to 0.7V operation. We implemented an LNA with DC feedback using an OPAMP and an LPF that is composed of OTAs that have positive feedback as well as a mechanism for adjusting the output common-mode voltage.

詳細を見る
技術カテゴリ System Architecture & Processor
氏名 K. Yamamoto(Sony Semiconductor Solutions Corporation)

We are approaching the age of IoE, in which wearable devices such as smart watches will be widespread. Sensing processors play a key role and the Global Navigation Satellite System (GNSS) is considered fundamental. Power consumption is one of the most important characteristics for such sensing processors. However, current GNSS receivers consume around 10mW [1,2] and are difficult to be embedded. GNSS receivers require high supply voltage for low-noise RF, which contributes to large power consumption. We developed 0.7V RF circuits that enable effective use of FD-SOI. Among the RF circuits, an LNA and an LPF are the key to 0.7V operation. We implemented an LNA with DC feedback using an OPAMP and an LPF that is composed of OTAs that have positive feedback as well as a mechanism for adjusting the output common-mode voltage.

[学会名]International Solid-State Circuits Conference(ISSCC)
[論文タイトル]A 12Gb/s 0.9mW/Gb/s Wide-Bandwidth Injection-Type CDR in 28nm CMOS with Reference-Free Frequency Capture

The consumer electronics market demands high-speed and low-power serial data interfaces. The injection locked oscillator (ILO) based clock and data recovery (CDR) circuit [1-2], is a well-known solution for these demands. The typical solution has at least two oscillators: a master and one or more slaves. The master, a replica of the data path ILO, is part of a phase locked loop (PLL) used to correct the oscillator free-running frequency (FRF). The slave ILO phase locks to the incoming data but uses the frequency control from the master. Any FRF difference between the master and slave, such as that caused by PVT or mismatch, reduces the receiver performance. One solution to the reduced performance [3] uses burst data and corrects the FRF between bursts. However, for continuous data, injection forces the recovered clock frequency to match the incoming data rate, masking any FRF error from the frequency detector. Existing solutions [4-5] use a phase detector (PD) to measure the FRF. However, any static phase offset between the PD lock point and the ILO lock point causes the frequency control algorithm to converge incorrectly. Static phase offset can be caused by mismatch, PVT, or layout. This paper describes an ILO-type CDR, called the frequency-capturing ILO (FCILO), that eliminates the master oscillator and combines the ILO and PLL [6] type CDRs, realizing the benefit of both. The ILO gives wide bandwidth and fast locking while the PLL gives wide frequency capture range. The CDR architecture, shown in Fig 10.4.2, has a half-rate ILO, data and edge samplers making a bang-bang phase detector (BBPD), two 2:10 demuxes, and independent digital phase and frequency control. The ILO is made from current-starved inverters and driven by an edge detector. The ILO has coarse and fine frequency tuning. The strength of the unit inverter of the oscillator is adjusted for coarse tuning, keeping the normalized gain and delay constant over a wide range of frequencies. A current DAC is used for fine tuning. The edge detector shorts the ILO differential nodes together to align clock and data transitions. The BBPD outputs are used by the digital phase and frequency control to determine if ILO edges are early or late with respect to the incoming data and to correct the ILO FRF. A variable delay circuit controls the timing between data and clock inputs to the BBPD, correcting the static phase offset between the PD and ILO lock points.

詳細を見る
技術カテゴリ System Architecture & Processor
氏名 T. Masuda(Sony Semiconductor Solutions Corporation)

The consumer electronics market demands high-speed and low-power serial data interfaces. The injection locked oscillator (ILO) based clock and data recovery (CDR) circuit [1-2], is a well-known solution for these demands. The typical solution has at least two oscillators: a master and one or more slaves. The master, a replica of the data path ILO, is part of a phase locked loop (PLL) used to correct the oscillator free-running frequency (FRF). The slave ILO phase locks to the incoming data but uses the frequency control from the master. Any FRF difference between the master and slave, such as that caused by PVT or mismatch, reduces the receiver performance. One solution to the reduced performance [3] uses burst data and corrects the FRF between bursts. However, for continuous data, injection forces the recovered clock frequency to match the incoming data rate, masking any FRF error from the frequency detector. Existing solutions [4-5] use a phase detector (PD) to measure the FRF. However, any static phase offset between the PD lock point and the ILO lock point causes the frequency control algorithm to converge incorrectly. Static phase offset can be caused by mismatch, PVT, or layout. This paper describes an ILO-type CDR, called the frequency-capturing ILO (FCILO), that eliminates the master oscillator and combines the ILO and PLL [6] type CDRs, realizing the benefit of both. The ILO gives wide bandwidth and fast locking while the PLL gives wide frequency capture range. The CDR architecture, shown in Fig 10.4.2, has a half-rate ILO, data and edge samplers making a bang-bang phase detector (BBPD), two 2:10 demuxes, and independent digital phase and frequency control. The ILO is made from current-starved inverters and driven by an edge detector. The ILO has coarse and fine frequency tuning. The strength of the unit inverter of the oscillator is adjusted for coarse tuning, keeping the normalized gain and delay constant over a wide range of frequencies. A current DAC is used for fine tuning. The edge detector shorts the ILO differential nodes together to align clock and data transitions. The BBPD outputs are used by the digital phase and frequency control to determine if ILO edges are early or late with respect to the incoming data and to correct the ILO FRF. A variable delay circuit controls the timing between data and clock inputs to the BBPD, correcting the static phase offset between the PD and ILO lock points.

[学会名]International Solid-State Circuits Conference(ISSCC)
[論文タイトル]A 1ms High-Speed Vision Chip with 3D-Stacked 140GOPS Column-Parallel PEs for Spatio-Temporal Image Processing

High-speed vision systems that combine high-frame-rate imaging and highly parallel signal processing enable instantaneous visual feedback to rapidly control machines over human-visual-recognition speeds. Such systems also enable a reduction in circuit scale by using a fast and simple algorithm optimized for high-frame-rate processing (Sony Corporation). Previous studies on vision systems and chips [1-4] have yielded low imaging performance due to large matrix-based processing element (PE) parallelization [1-3], and low functionality of the limited-purpose column-parallel PE architecture [4], constraining vision-chip applications.

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 T. Yamazaki(Sony Semiconductor Solutions Corporation)

High-speed vision systems that combine high-frame-rate imaging and highly parallel signal processing enable instantaneous visual feedback to rapidly control machines over human-visual-recognition speeds. Such systems also enable a reduction in circuit scale by using a fast and simple algorithm optimized for high-frame-rate processing (Sony Corporation). Previous studies on vision systems and chips [1-4] have yielded low imaging performance due to large matrix-based processing element (PE) parallelization [1-3], and low functionality of the limited-purpose column-parallel PE architecture [4], constraining vision-chip applications.

[学会名]International Solid-State Circuits Conference(ISSCC)
[論文タイトル]A 1/2.3in 20Mpixel 3-Layer Stacked CMOS Image Sensor with DRAM

In recent years, the performance of cellphone cameras has improved, and is becoming comparable to that of SLR cameras. However, the big difference between cellphone cameras and SLR cameras is the distortion due to the rolling exposure of CMOS image sensors (CISs) because cellphone cameras cannot have a mechanical shutters (Sony Corporation). In addition to this technical problem, the demands for high quality in dark situations and for movies are increasing. Frame-level signal processing can solve these problems, but previous generations of CIS could not achieve both high-speed readout and accessible I/F speed. This paper presents 3-layer-stacked back-illuminated CMOS Image Sensor (3L-BI-CIS) with mounted DRAM as the frame memory.

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 T. Haruta(Sony Semiconductor Solutions Corporation)
参考リンク

In recent years, the performance of cellphone cameras has improved, and is becoming comparable to that of SLR cameras. However, the big difference between cellphone cameras and SLR cameras is the distortion due to the rolling exposure of CMOS image sensors (CISs) because cellphone cameras cannot have a mechanical shutters (Sony Corporation). In addition to this technical problem, the demands for high quality in dark situations and for movies are increasing. Frame-level signal processing can solve these problems, but previous generations of CIS could not achieve both high-speed readout and accessible I/F speed. This paper presents 3-layer-stacked back-illuminated CMOS Image Sensor (3L-BI-CIS) with mounted DRAM as the frame memory.

[学会名]International Solid-State Circuits Conference(ISSCC)
[論文タイトル]A 1/4-inch 3.9Mpixel Low-Power Event-Driven Back-Illuminated Stacked CMOS Image Sensor

Wireless products such as smart home-security cameras, intelligent agents, and virtual personal assistants, are evolving rapidly to satisfy our needs. Small size, extended battery life, transparent machine interfaces: all these are required of the camera system in these applications. These applications, in battery-limited environments, can profit from an event-driven approach for moving-object detection. This paper presents a 1/4-inch 3.9Mpixel low-power event-driven (ED) back-illuminated stacked CMOS image sensor (CIS) deployed with a pixel readout circuit that detects moving objects for each pixel under lighting conditions ranging from 1 to 64,000lux. Utilizing pixel summation in a shared floating diffusion (FD) for each pixel block, moving object detection is realized at 10 frames per second while consuming only 1.1mW, a 99% reduction in power from the same CIS at a full-resolution 60fps power of 95mW.

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 O. Kumagai(Sony Semiconductor Solutions Corporation)

Wireless products such as smart home-security cameras, intelligent agents, and virtual personal assistants, are evolving rapidly to satisfy our needs. Small size, extended battery life, transparent machine interfaces: all these are required of the camera system in these applications. These applications, in battery-limited environments, can profit from an event-driven approach for moving-object detection. This paper presents a 1/4-inch 3.9Mpixel low-power event-driven (ED) back-illuminated stacked CMOS image sensor (CIS) deployed with a pixel readout circuit that detects moving objects for each pixel under lighting conditions ranging from 1 to 64,000lux. Utilizing pixel summation in a shared floating diffusion (FD) for each pixel block, moving object detection is realized at 10 frames per second while consuming only 1.1mW, a 99% reduction in power from the same CIS at a full-resolution 60fps power of 95mW.

[学会名]International Solid-State Circuits Conference(ISSCC)
[論文タイトル]A Back-Illuminated Global-Shutter CMOS Image Sensor with Pixel-Parallel 14b Subthreshold ADC

Rolling-shutter CMOS image sensors (CISs) are widely used [1,2]. However, the distortion of moving subjects remains an unresolved problem, regardless of the speed at which these sensors are operated. It has been reported that by adopting in-pixel analog memory (MEM) in pixels, a global shutter (GS) can be achieved by saving all pixels simultaneously as stored charges [3,4]. However, as signals from a storage unit are read in a column-wise sequence, a light-shielding structure is required for the MEM to suppress the influence of parasitic light during the reading period. Pixel-parallel ADCs have been reported as methods of implementing GS on a circuit [5,6]. However, these techniques have not been successful in operations on megapixels because they do not address issues such as the timing constraint for reading and writing a digital signal to and from an ADC in a pixel owing to increase in the number of pixels and the increase in the total power consumption of massively parallel comparators (CMs).

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 M. Sakakibara(Sony Semiconductor Solutions Corporation)
参考リンク

Rolling-shutter CMOS image sensors (CISs) are widely used [1,2]. However, the distortion of moving subjects remains an unresolved problem, regardless of the speed at which these sensors are operated. It has been reported that by adopting in-pixel analog memory (MEM) in pixels, a global shutter (GS) can be achieved by saving all pixels simultaneously as stored charges [3,4]. However, as signals from a storage unit are read in a column-wise sequence, a light-shielding structure is required for the MEM to suppress the influence of parasitic light during the reading period. Pixel-parallel ADCs have been reported as methods of implementing GS on a circuit [5,6]. However, these techniques have not been successful in operations on megapixels because they do not address issues such as the timing constraint for reading and writing a digital signal to and from an ADC in a pixel owing to increase in the number of pixels and the increase in the total power consumption of massively parallel comparators (CMs).

[学会名]International Solid-State Circuits Conference(ISSCC)
[論文タイトル]Compressive Imaging for CMOS Image Sensors

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 Y. Oike(Sony Semiconductor Solutions Corporation)

[受賞名]2016年度IEC 1906 賞
[受賞タイトル]IEC1906賞
[表彰機関]国際電気標準会議(IEC)

プロジェクトリーダー及びエキスパートとして、デジタルオーディオインタフェース規格群である IEC 61937 及び IEC 60958 シリーズの開発に貢献。
国内委員会の代表として 10 年以上尽力。
近年はプロジェクトリーダーとして MT 61937-1, -2, -7 を成功裏にリードし、IEC 60958 の拡張に焦点をあてた新プロジェクトPT 100-12 の設立に多大に貢献。

詳細を見る
技術カテゴリ Audio & Acoustics
氏名 市村 元

プロジェクトリーダー及びエキスパートとして、デジタルオーディオインタフェース規格群である IEC 61937 及び IEC 60958 シリーズの開発に貢献。
国内委員会の代表として 10 年以上尽力。
近年はプロジェクトリーダーとして MT 61937-1, -2, -7 を成功裏にリードし、IEC 60958 の拡張に焦点をあてた新プロジェクトPT 100-12 の設立に多大に貢献。

[受賞名]第24回 技術開発賞
[受賞タイトル]MPEG-4 AACを用いた22.2ch音声符号化・復号装置の開発
[表彰機関]一般社団法人日本音響学会

本開発では必要チャネル数とスピーカの配置の検討等の音響学的基礎研究と、その後の実用化研究が進められ、特に符号化する際のチャンネルの組み合わせ方法を MPEG で国際標準化したことは技術の普及に大きく貢献する活動である。

詳細を見る
技術カテゴリ Audio & Acoustics
氏名 知念 徹
辻 実
畠中 光行
本間 弘幸

本開発では必要チャネル数とスピーカの配置の検討等の音響学的基礎研究と、その後の実用化研究が進められ、特に符号化する際のチャンネルの組み合わせ方法を MPEG で国際標準化したことは技術の普及に大きく貢献する活動である。

[受賞名]2016年度 経済産業省 工業標準化事業表彰・ 産業技術環境局長表彰
[受賞タイトル]国際標準化貢献者表彰
[表彰機関]経済産業省

IEC/TC76(レーザ機器の安全性)の国内委員会エキスパートとして国際会議に出席し、レーザを光源とするプロジェクタの規制緩和につながる規格改正を提案推進。また、IEC62471-5(映像プロジェクタの光生物学的安全性)の規格化プロジェクトリーダとして海外委員との意見を調整しつつ国内のプロジェクタ業者団体とも密接な連携を保ち、その結果十分な国内意見の盛り込みを実現。非常に短期間での発行を成し遂げ、国際的な市場環境の緩和に多大な貢献した。

詳細を見る
技術カテゴリ Device & Material
氏名 三橋 正示

IEC/TC76(レーザ機器の安全性)の国内委員会エキスパートとして国際会議に出席し、レーザを光源とするプロジェクタの規制緩和につながる規格改正を提案推進。また、IEC62471-5(映像プロジェクタの光生物学的安全性)の規格化プロジェクトリーダとして海外委員との意見を調整しつつ国内のプロジェクタ業者団体とも密接な連携を保ち、その結果十分な国内意見の盛り込みを実現。非常に短期間での発行を成し遂げ、国際的な市場環境の緩和に多大な貢献した。

[受賞名]2016年度 経済産業省 工業標準化事業表彰・ 産業技術環境局長表彰
[受賞タイトル]国際標準化貢献者表彰
[表彰機関]経済産業省

ISO/IEC JTC1/SC29(音声、画像、マルチメディア、ハイパーメディア情報符号化)/WG11(動画像符号化)において、ISO/IEC14466-3/Amd.4の制定を主導し、22.2マルチチャンネル音響方式などの日本提案を反映。同23003-3の制定においても日本提案による技術の反映に尽力。日本における将来の高臨場感放送の実現という点において極めて重要であり、日本の産業界に貢献した。

詳細を見る
技術カテゴリ Audio & Acoustics
氏名 知念 徹

ISO/IEC JTC1/SC29(音声、画像、マルチメディア、ハイパーメディア情報符号化)/WG11(動画像符号化)において、ISO/IEC14466-3/Amd.4の制定を主導し、22.2マルチチャンネル音響方式などの日本提案を反映。同23003-3の制定においても日本提案による技術の反映に尽力。日本における将来の高臨場感放送の実現という点において極めて重要であり、日本の産業界に貢献した。

[受賞名]2016年度 経済産業省 工業標準化事業表彰・ 産業技術環境局長表彰
[受賞タイトル]国際標準化貢献者表彰
[表彰機関]経済産業省

ISO/IEC JTC1/SC29(音声、画像、マルチメディア、ハイパーメディア情報符号化)/WG11(動画像符号化)の扱う幅広いシステム技術領域において、日本の技術領域の第一人者として、主要な欧米企業との技術議論による日本
提案の規格への反映を主導。上記規格群に対する日本の投票コメント作成においてもリーダシップをとり、SC29/WG11/SYSTEMS小委員会の幹事として会議運営の活性化に貢献した。

詳細を見る
技術カテゴリ Display & Visual
氏名 平林 光浩

ISO/IEC JTC1/SC29(音声、画像、マルチメディア、ハイパーメディア情報符号化)/WG11(動画像符号化)の扱う幅広いシステム技術領域において、日本の技術領域の第一人者として、主要な欧米企業との技術議論による日本
提案の規格への反映を主導。上記規格群に対する日本の投票コメント作成においてもリーダシップをとり、SC29/WG11/SYSTEMS小委員会の幹事として会議運営の活性化に貢献した。

[受賞名]2016年度 国際規格開発賞
[表彰機関]情報処理学会 情報規格調査会

詳細を見る
技術カテゴリ Display & Visual
氏名 鈴木 輝彦

[受賞名]2016年度 標準化功績賞
[表彰機関]情報処理学会 情報規格調査会

詳細を見る
技術カテゴリ Display & Visual
氏名 鈴木 輝彦

[受賞名]2016年度 春の褒章 紫綬褒章
[表彰機関]総務省

ブルーレイディスクの基本構造と製法の開発に関する業績に対する受賞。

詳細を見る
技術カテゴリ Industrial & Manufacturing
氏名 柏木 俊行
参考リンク

ブルーレイディスクの基本構造と製法の開発に関する業績に対する受賞。

[受賞名]2016年度全国発明表彰 内閣総理大臣賞
[受賞タイトル]積層型多機能CMOSイメージセンサ構造の発明
[表彰機関]公益社団法人発明協会

積層型多機能CMOSイメージセンサー構造の発明により、次世代イメージセンサーのさらなる多機能化を促進する技術の確立と発展に貢献した。

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 梅林 拓
髙橋 洋
庄子 礼二郎
参考リンク

積層型多機能CMOSイメージセンサー構造の発明により、次世代イメージセンサーのさらなる多機能化を促進する技術の確立と発展に貢献した。

[受賞名]2017年度 標準化貢献賞
[表彰機関]情報処理学会 情報規格調査会

SC 29/WG 11 システム小委員会の幹事として MPEG(ISO/IEC JTC 1/SC 29/WG 11)のアダプティブストリーミング規格および関連規格のファイルフォーマット規格の策定に貢献。本規格は主要な動画配信サービスや放送関連規格に採用され,産業界の発展に大きく貢献した。

詳細を見る
技術カテゴリ Communication
氏名 平林 光浩

SC 29/WG 11 システム小委員会の幹事として MPEG(ISO/IEC JTC 1/SC 29/WG 11)のアダプティブストリーミング規格および関連規格のファイルフォーマット規格の策定に貢献。本規格は主要な動画配信サービスや放送関連規格に採用され,産業界の発展に大きく貢献した。

[受賞名]2017年度 技術エミー賞
[受賞タイトル]High Efficiency Video Coding (HEVC) の標準化活動
[表彰機関]テレビ芸術科学アカデミー

The development of High Efficiency Video Coding (HEVC) has enabled efficient delivery in ultra-high-definition (UHD) content over multiple distribution channels. This new compression coding has been adopted, or selected for adoption, by all UHD television distribution channels, including terrestrial, satellite, cable, fiber and wireless, as well as all UHD viewing devices, including traditional televisions, tablets and mobile phones.

詳細を見る
技術カテゴリ Display & Visual
氏名 鈴木 輝彦
参考リンク

The development of High Efficiency Video Coding (HEVC) has enabled efficient delivery in ultra-high-definition (UHD) content over multiple distribution channels. This new compression coding has been adopted, or selected for adoption, by all UHD television distribution channels, including terrestrial, satellite, cable, fiber and wireless, as well as all UHD viewing devices, including traditional televisions, tablets and mobile phones.

[受賞名]平成30年度 科学技術分野の文部科学大臣表彰 科学技術賞
[受賞タイトル]積層型多機能CMOSイメージセンサ構造の開発
[表彰機関]文部科学省

積層型多機能CMOSイメージセンサー構造の開発により、次世代イメージセンサーのさらなる多機能化を促進する技術の確立と発展に貢献したことによる受賞。

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 梅林 拓
髙橋 洋
庄子 礼二郎
参考リンク

積層型多機能CMOSイメージセンサー構造の開発により、次世代イメージセンサーのさらなる多機能化を促進する技術の確立と発展に貢献したことによる受賞。

[受賞名]平成30年度 全国発明表彰 朝日新聞社賞
[受賞タイトル]タッチ操作を用いたワイヤレス機器接続方法の発明
[表彰機関]公益社団法人発明協会

タッチ操作を用いたワイヤレス機器接続方法に関する本発明が、ユーザーを問わないユニバーサルデザインを実現した製品の普及・発展に貢献。

詳細を見る
技術カテゴリ Human Interface
氏名 暦本 純一
大場 晴夫
綾塚 祐二
松下 伸行
エドワルド エー シャマレラ
参考リンク

タッチ操作を用いたワイヤレス機器接続方法に関する本発明が、ユーザーを問わないユニバーサルデザインを実現した製品の普及・発展に貢献。

[受賞名]SSII2018 インタラクティブセッション オーディエンス賞
[受賞タイトル]裏面照射型4方向偏光CMOSイメージセンサを利用したリアルタイム反射成分分離・応用信号処理技術
[表彰機関]画像センシング技術研究会

偏光イメージセンサを利用して、反射成分分離を高精度に行うための一連の信号処理・アルゴリズムおよび分離した反射成分を活用する方法について発表したことによる受賞。

詳細を見る
技術カテゴリ Imaging & Sensing
氏名 栗田 哲平
海津 俊
平澤 康孝
近藤 雄飛
村山 淳
秋山 健太郎
上坂 祐介
丸山 康
山崎 知洋

偏光イメージセンサを利用して、反射成分分離を高精度に行うための一連の信号処理・アルゴリズムおよび分離した反射成分を活用する方法について発表したことによる受賞。

このページの先頭へ