Contextual Bandit Learning for Machine Type Communications in the Null Space of Multi-Antenna Systems

Ensuring an effective coexistence of conventional broadband cellular users with machine type communications (MTCs) is challenging due to the interference from MTCs to cellular users. This interference challenge stems from the fact that the acquisition of channel state information (CSI) from machine type devices (MTD) to cellular base stations (BS) is infeasible due to the small packet nature of MTC traffic. In this paper, a novel approach based on the concept of opportunistic spatial orthogonalization (OSO) is proposed for interference management between MTC and conventional cellular communications. In particular, a cellular system is considered with a multi-antenna BS in which a receive beamformer is designed to maximize the rate of a cellular user, and, a machine type aggregator (MTA) that receives data from a large set of MTDs. The BS and MTA share the same uplink resources, and, therefore, MTD transmissions create interference on the BS. However, if there is a large number of MTDs to chose from for transmission at each given time for each beamformer, one MTD can be selected such that it causes almost no interference on the BS. A comprehensive analytical study of the characteristics of such an interference from several MTDs on the same beamformer is carried out. It is proven that, for each beamformer, an MTD exists such that the interference on the BS is negligible. To further investigate such interference, the distribution of the signal-to-interference-plus-noise ratio (SINR) of the cellular user is derived, and, subsequently, the distribution of the outage probability is presented. However, the optimal implementation of OSO requires the CSI of all the links in the BS, which is not practical for MTC. To solve this problem, an online learning method based on the concept of contextual multi-armed bandits (MAB) learning is proposed. The receive beamformer is used as the context of the contextual MAB setting and Thompson sampling: a well-known method of solving contextual MAB problems is proposed. Since the number of contexts in this setting can be unlimited, approximating the posterior distributions of Thompson sampling is required. Two function approximation methods, a) linear full posterior sampling, and, b) neural networks are proposed for optimal selection of MTD for transmission for the given beamformer. Simulation results show that is possible to implement OSO with no CSI from MTDs to the BS. Linear full posterior sampling achieves almost 90% of the optimal allocation when the CSI from all the MTDs to the BS is known.

Ali Samad, Asgharimoghaddam Hossein, Rajatheva Nandana, Saad Walid, Haapola Jussi

Publication type:
A1 Journal article – refereed

Place of publication:

deep contextual bandits, fast uplink grant, Internet of Things, Machine-type communications, Multi-Antenna Communications, multi-armed bandits, scheduling, Thompson sampling

25 November 2019

Full citation:
S. Ali, H. Asgharimoghaddam, N. Rajatheva, W. Saad and J. Haapola, “Contextual Bandit Learning for Machine Type Communications in the Null Space of Multi-Antenna Systems,” in IEEE Transactions on Communications, vol. 68, no. 2, pp. 1284-1296, Feb. 2020,


Read the publication here: