VENKATA SAI PRANEETH KAREMPUDI , Janibul Bashir, ISHAN G THAKKAR 2023. An Analysis of Various Design Pathways Towards Multi-Terabit Photonic On-Interposer Interconnects. In Journal of Emerging Technologies (JETC 2023)
Janibul Bashir, Uzmat un Nisa, Kalimullah Lone 2023. Enhancing Microarchitecture Performance through Synergistic Dynamic Branch Prediction and Cache Prefetchings. In IEEE International Conference on Modelling, Simulation, and Intelligent Computing (MoSICom 2023).
Tajamul Ashraf, Janibul Bashir, 2023. Climate Change Parameter Dataset (CCPD): A Benchmark Dataset for Climate Change parameters in Jammu and Kashmir In Proceedings of the 4th International Conference on Data Science and Applications (ICDSA 2023)
Tajamul Ashraf, Naiyer Abbas, Mohammad Haseeb, Nadeem Yousuf, Janibul Bashir, 2022. An Integral Computer Vision System for Apple Detection, Classification, and Semantic Segmentation. In Proceedings of the 15th International Conference on Machine Vision (ICMV 2022).
Shafi, O. and Janibul Bashir, 2021. FreqCounter: Efficient cacheability of encryption and integrity tree counters in secure processors. Journal of Systems Architecture, p.102252.
Abstract: The data in the off-chip main memory can be potentially extracted or tampered by an adversary having physical access to the device and thus it becomes inevitable to secure the data present in the off-chip memory. The modern designs consider storing the counters on-chip to prevent replay attacks. However, these designs have significant overheads in terms of the on-chip storage used to store counters, and the additional execution time. In this paper, we propose a new mechanism FreqCounter that trims the on-chip storage by storing the counters blockwise in a tag array on-chip cache rather than in a page-wise manner as is conventionally done. The counters for only those blocks are maintained that are expected to miss frequently in the last level cache (LLC). We show that the FreqCounter reduces the space overheads compared to prior schemes by 53.02% keeping performance almost similar if not better. We further show that our design reduces the Energy-Delay Product(ED^2) by 14.5% on an average compared to the recent competing scheme Morphable Counters without any compromise in the security.
Hussaina, I. and Janibul Bashir, 2021. Dynamic MTU: A smaller path MTU size technique to reduce packet drops in IPv6. Journal of King Saud University-Computer and Information Sciences.
Abstract: With an increase in the number of internet users and the need to secure internet traffic, the IPv4 protocol has been replaced by a more secure protocol, namely IPv6. The IPv6 protocol does not allow intermediate routers to fragment the on-going packets. Moreover, due to IP tunneling, some extra headers are added to the IPv6 packet, exceeding the packet size, resulting in increased packet drops due to lower path mtu. One probable solution is to use Path MTU Discovery (PMTUD) to know the path mtu using ICMP packets. Due to dependency on ICMP error messages, this method faces security and failure issues. In this paper, we propose a Dynamic MTU (DMTU) scheme, which tries to handle the packet drops in IPv6 network, by dynamically adjusting the MTU of each link depending upon the incoming packet size, thereby reducing the number of packet drops by a significant amount. Unlike PMTUD, the algorithm works on intermediate node level which is further optimised by assigning specific phases for validation and then, for processing. The method has ability to work in standalone and in parallel with PMTUD. Using mathematical and graphical analysis, our scheme proves to be much more efficient than the state-of-the-art PMTUD scheme.
Janibul Bashir, and Tahir Ahmad Wani. "Inventive Investment Using Bigdata: Tools, Applicability and Challenges Associated." In Computational Management, pp. 599-627. Springer, Cham, 2021.
Abstract: Traditional investment models have reached to a saturation level that it is easily recognizable to determine and demonstrate their inadequacy against the current huge data-based investment models. Big Data is taking all management spheres by storm be it marketing, financing or investment. Today colossal and developing volumes of data are being generated and analyzed upon to arrive at meaningful solutions. Big Data analytics joined with the business models can make wonders in the company gains. The Big data analytics will permit organizations to test hypotheses continuously and see the probable outcomes of each hypothesis before bringing the same to the market. This reduces the risks of loss and accelerates the benefits of the organizations tremendously. Insights from large information investigation have the likelihood to empower business process significantly. One important advantage of Big Data analytics is to extract the patterns from the data, and client inclinations and accordingly assist organizations with making effective decisions in business, administrations, and other relevant items. The utilizing of Big Data and machine learning algorithms to dissect the Big information for any association, can take care of issues in different verticals and estimate the business future with more noteworthy speed and unwavering quality. Information investigation has been in the Business Intelligence space for a serious long time giving ‘Point answers’ for explicit issues in any business. The focus of this chapter is to understand these algorithms and their efficacy in the different business scenarios.
Shafi, Omais, and Janibul Bashir. "SecSched: Flexible Scheduling in Secure Processors." In Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques, pp. 229-240. 2020.
Abstract: Trusted execution environments (TEEs) are an integral part of modern processors because security has become a very important concern. However, many such environments are bedeviled by the high cost of context switches, particularly when there is a switch from secure mode to non-secure mode owing primarily to cache pollution and TLB-flushing overheads. State-of-the-art implementations create a secure shared memory channel between a thread running in secure mode and a thread running in non-secure mode, which invokes system calls on its behalf. We argue that this is inefficient, and it is possible to reduce the overheads significantly by efficiently storing the context of secure threads and intelligent scheduling. In this paper, we propose a new scheduling algorithm SecSched that uses Cuckoo filters to capture the context of a thread. We schedule threads with similar contexts on the same core to leverage the effects of the locality. Our algorithm requires minimal hardware enhancements that are limited to maintaining a Cuckoo filter per core and a thread with the addition of few performance counters per thread to keep track of the miss counts. We show that with these minimal changes we can increase the performance of a suite of OS-intensive workloads by 27.6% with a minimal area overhead (around 0.04%).
Janibul Bashir, Chandran Goodchild, and Smruti Ranjan Sarangi. "SecONet: A Security Framework for a Photonic Network-on-Chip." In 2020 14th IEEE/ACM International Symposium on Networks-on-Chip (NOCS), pp. 1-8. IEEE, 2020.
Abstract: Photonic networks are already commercially available at the board-level, and many fabrication facilities can fabricate optical networks and integrate them with traditional silicon-based SoCs. Almost all the research in on-chip photonics has been in the areas of performance enhancement and static power reduction. However, before the large-scale adoption of such technologies, it is necessary to solve security problems. As opposed to electrical NoCs, optical NoCs are shared to a much larger extent, and are significantly more sensitive to the latencies of cryptographic operations. Hence, it is necessary to design a novel protocol for securing such networks. We propose a novel, secure, and efficient optical network in this paper (SecONet) that is immune to eavesdropping, spoofing, replay, and message-removal attacks. Using a combination of speculative execution and pre-computation, we reduce the performance overhead of 39.53% with a conventional implementation to 14.2% for a suite of Splash2 and Parsec benchmarks. The additional area overhead of our proposed hardware is modest: 1.6%.
Janibul Bashir, and Smruti R. Sarangi. "GPUOPT: Power-efficient Photonic Network-on-Chip for a Scalable GPU." ACM Journal on Emerging Technologies in Computing Systems (JETC) 17, no. 1 (2020): 1-26.
Abstract: On-chip photonics is a disruptive technology, and such NoCs are superior to traditional electrical NoCs in terms of latency, power, and bandwidth. Hence, researchers have proposed a wide variety of optical networks for multicore processors. The high bandwidth and low latency features of photonic NoCs have led to the overall improvement in the system performance. However, there are very few proposals that discuss the usage of optical interconnects in Graphics Processor Units (GPUs). GPUs can also substantially gain from such novel technologies, because they need to provide significant computational throughput without further stressing their power budgets.
The main shortcoming of optical networks is their high static power usage, because the lasers are turned on all the time by default, even when there is no traffic inside the chip, and thus sophisticated laser modulation schemes are required. Such modulation schemes base their decisions on an accurate prediction of network traffic in the future. In this article, we propose an energy-efficient and scalable optical interconnect for modern GPUs called GPUOPT that smartly creates an overlay network by dividing the symmetric multiprocessors (SMs) into clusters. It furthermore has separate sub-networks for coherence and non-coherence traffic. To further increase the throughput, we connect the off-chip memory with optical links as well.
Subsequently, we show that traditional laser modulation schemes (for reducing static power consumption) that were designed for multicore processors are not that effective for GPUs. Hence, there was a need to create a bespoke scheme for predicting the laser power usage in GPUs.
Using this set of techniques, we were able to improve the performance of a modern GPU by 45% as compared to a state-of-the-art electrical NoC. Moreover, as compared to competing optical NoCs for GPUs, our scheme reduces the laser power consumption by 67%, resulting in a net 65% reduction in ED2 for a suite of Rodinia benchmarks.
Janibul Bashir, Khushal Sethi, and Smruti R. Sarangi. "Power efficient photonic network-on-chip for a scalable GPU." In Proceedings of the 13th IEEE/ACM International Symposium on Networks-on-Chip, pp. 1-2. 2019.
Abstract: In this paper, we propose an energy efficient and scalable optical interconnect for GPUs. We intelligently divide the components in a GPU into different types of clusters and enable these clusters to communicate optically with each other. In order to reduce the network delay, we use separate networks for coherence and non-coherence traffic. Moreover, to reduce the static power consumption in optical interconnects, we modulate the off-chip light source by proposing a novel GPU specific prediction scheme for on-chip network traffic. Using our design, we were able to increase the performance by 17% and achieve a 65% reduction in ED2 as compared to a state-of-the-art optical topology.
Janibul Bashir and Smruti Ranjan Sarangi. "Predict, Share, and Recycle Your Way to Low-power Nanophotonic Networks." ACM Journal on Emerging Technologies in Computing Systems (JETC) 16, no. 1 (2019): 1-26.
Abstract: High static power consumption is widely regarded as one of the largest bottlenecks in creating scalable optical NoCs. The standard techniques to reduce static power are based on sharing optical channels and modulating the laser. We show in this article that state-of-the-art techniques in these areas are suboptimal, and there is a significant room for further improvement. We propose two novel techniques—a neural network--based method for laser modulation by predicting optical traffic and a distributed and altruistic algorithm for channel sharing—that are significantly closer to a theoretically ideal scheme. In spite of this, a lot of laser power still gets wasted. We propose to reuse this energy to heat micro-ring resonators (achieve thermal tuning) by efficiently recirculating it. These three methods help us significantly reduce the energy requirements. Our design consumes 4.7× lower laser power as compared to other state-of-the-art proposals. In addition, it results in a 31% improvement in performance and 39% reduction in ED2 for a suite of Splash2 and Parsec benchmarks.
Ghosh, Rajib R., Janib Bashir, Smruti R. Sarangi, and Anuj Dhawan. "SpliESR: Tunable power splitter based on an electro-optic slotted ring resonator." Optics Communications 442 (2019): 117-122.
Abstract: In this paper, we present a novel optical power splitter having an arbitrary split-ratio that can be tuned over a wide range by employing relatively low voltage levels. It is based on a slotted ring resonator. A 120 nm electro-optic polymer-filled slot is created throughout the circumference of the ring. The hybrid ring resonator is made to work between the full and off resonance states, allowing it to work as a power splitter. This is done by changing the refractive index of the electro-optic polymer inside the slot by the application of an external electric field. The splitter combines the electro-optic functionality of the polymer with the high index contrast of the silicon, resulting in a low tuning voltage power splitter. Over a small voltage range of 0–1 V, it is possible to change the split-ratio of this splitter from 0.031–16.738, making it 10 times better than other competing designs. In addition, it takes less than 500 ps to reconfigure the splitter.
Ghosh, Rajib R., Janib Bashir, Smruti R. Sarangi, Abhijit Das, and Anuj Dhawan. "Slotted electro-optic ring resonator as a tunable optical power splitter." In Silicon Photonics XIV, vol. 10923, p. 109231U. International Society for Optics and Photonics, 2019.
Abstract: In this letter, we present a novel optical power splitter having an arbitrary split-ratio that can be tuned over a wide range by employing relatively low voltage levels. It is based on a slotted ring resonator. A 120 nm electro-optic polymer-filled slot is created throughout the circumference of the ring. The hybrid ring resonator is made to work between the full and off resonance states, allowing it to work as a power splitter. This is done by changing the refractive index of the electrooptic polymer inside the slot by the application of an external voltage.
Janibul Bashir, Eldhose Peter, and Smruti R. Sarangi. "A survey of on-chip optical interconnects." ACM Computing Surveys (CSUR) 51, no. 6 (2019): 1-34.
Abstract: Numerous challenges present themselves when scaling traditional on-chip electrical networks to large manycore processors. Some of these challenges include high latency, limitations on bandwidth, and power consumption. Researchers have therefore been looking for alternatives. As a result, on-chip nanophotonics has emerged as a strong substitute for traditional electrical NoCs.
As of 2017, on-chip optical networks have moved out of textbooks and found commercial applicability in short-haul networks such as links between servers on the same rack or between two components on the motherboard. It is widely acknowledged that in the near future, optical technologies will move beyond research prototypes and find their way into the chip. Optical networks already feature in the roadmaps of major processor manufacturers and most on-chip optical devices are beginning to show signs of maturity.
This article is designed to provide a survey of on-chip optical technologies covering the basic physics underlying the operation of optical technologies, optical devices, popular architectures, power reduction techniques, and applications. The aim of this survey article is to start from the fundamental concepts and move on to the latest in the field of on-chip optical interconnects.
Janibul Bashir, Eldhose Peter, and Smruti R. Sarangi. "BigBus: A scalable optical interconnect." ACM Journal on Emerging Technologies in Computing Systems (JETC) 15, no. 1 (2019): 1-24.
Abstract: This article presents BigBus, a novel design of an on-chip photonic network for a 1,024-node system. For such a large on-chip network, performance and power reduction are two mutually conflicting goals. This article uses a combination of strategies to reduce static power consumption while simultaneously improving performance and the energy-delay2 (ED2) product. The crux of the article is to segment the entire system into smaller clusters of nodes and adopt a hybrid strategy for each segment that includes conventional laser modulation, as well as a novel technique for sharing power across nodes dynamically. We represent energy internally as tokens, where one token will allow a node to send a message to any other node in its cluster. We allow optical stations to arbitrate for tokens at a global level, and then we predict the number of token equivalents of power that the off-chip laser needs to generate. Using these techniques, BigBus outperforms other competing proposals. We demonstrate a speedup of 14--34% over state of the art proposals and a 20--61% reduction in ED2.
Janibul Bashir and Smruti R. Sarangi. "NUPLet: A Photonic Based Multi-Chip NUCA Architecture." In 2017 IEEE International Conference on Computer Design (ICCD), pp. 617-624. IEEE, 2017.
Abstract: Area, manufacturing yield and lack of scalable interconnects restrict single chip designs to a small number of cores (16-32). However, multi-chip designs with the help of silicon photonics can overcome area and yield constraints and make it possible to design a virtual chip, which can scale to a large number of cores. Sadly, the scalability of such designs is limited by the high percentage of inter-chip messages and relatively lower hit rate in remote cache banks. In this paper, we propose NUPLet, a multi-chip architecture that tries to remove these limitations by separating the intra and inter chip networks. It proposes to use a non-uniform cache architecture (NUCA) scheme on top of a virtual chip in order to decrease inter chip communication and increase the hit rate in the last level cache. In addition, we propose a prediction mechanism for predicting the number of inter chip messages in the network. This is used to modulate the laser accordingly, and reduce static power consumption. We simulated a four chip based NUPLet design with each chip containing 32 cores. For a suite of Splash2 and Parsec benchmarks, NUPLet increased the last level cache hit rate by 70% as compared to other state of the art proposals. Furthermore, NUPLet improved performance by 28%, reduced power consumption by 39%, and reduced ED 2 by 41%.
Peter, Eldhose, Janibul Bashir, and Smruti R. Sarangi. "POSTER: BigBus: A scalable optical interconnect." In 2017 26th International Conference on Parallel Architectures and Compilation Techniques (PACT), pp. 162-163. IEEE, 2017.
Abstract: This paper presents BigBus, a novel on-chip photonic network for a 1024 node system. The crux of the idea is to segment the entire system into smaller clusters of nodes, and adopt a hybrid strategy for each segment that includes conventional laser modulation, as well as a novel technique for sharing power across nodes dynamically. We represent energy internally as tokens, where one token will allow a node to send a message to any other node in its cluster. We allow optical stations to arbitrate for tokens and at a global level, we predict the number of token equivalents of power that the off-chip laser needs to generate.
Peter, Eldhose, Anuj Arora, Janibul Bashir, Akriti Bagaria, and Smruti R. Sarangi. "Optical overlay NUCA: A high-speed substrate for shared L2 caches." ACM Journal on Emerging Technologies in Computing Systems (JETC) 13, no. 4 (2017): 1-25.
Abstract: In this article, we propose using optical networks-on-chip (NoCs) to design cache access protocols for large shared L2 caches. We observe that the problem is unique because optical networks have very low latency, and in principle all of the cache banks are very close to each other. A naive approach is to broadcast a request to a set of banks that might possibly contain the copy of a block. However, this approach is wasteful in terms of energy and bandwidth. Hence, we propose a set of novel schemes that create a set of virtual networks (overlays) of cache banks over a physical optical NoC. We search for a block inside each overlay using a combination of multicast and unicast messages. We first propose two simple protocols: TSI and Broadcast. The former uses unicast messages, and the latter uses multicast messages. We subsequently propose an improved scheme, OP_BCAST, that combines the best of TSI and Broadcast, and mainly uses restricted multicast messages. Then we propose a set of novel hardware structures for creating and managing overlays, for efficiently locating blocks in the overlay, and for implementing dynamically changing overlays with OP_BCAST. The performance of the TSI scheme is within 2% to 3% of a broadcast scheme, and it is faster than traditional schemes with electrical networks by 26%. Compared to the broadcast scheme, it reduces the number of accesses, and consequently the dynamic energy of the caches by 6% to 8%. OP_BCAST is 34% faster than the best solutions with copper-based NoCs; moreover, it reduces the dynamic energy for cache access by 33% compared to the TSI scheme.