About Me

I am an Assistant Professor at Virginia Tech, where I currently work as an AI Research Scientist, Digital Libraries for the University Libraries. I am also a faculty member of the Center for Digital Research and Scholarship at Virginia Tech. I completed my Ph.D. in Computer Science from Virignia Tech, in August 2024 advised by Dr. Edward A. Fox . My dissertation titled "Improving Access to ETD Elements Through Chapter Categorization and Summarization" brings computational access to long documents such as Electronic Theses and Dissertations (ETDs) by providing readers with more granular level metadata information such as chapter summaries and classification labels.

Research Interests

My primary research interest lies in Natural Language Processing, AI, Machine Learning, Large Language Models, and Digital Libraries.

Some of my current projects include:

  1. Improving natural language understanding from digital library collections:
    Methods to extract information, classify, summarize multi-modal content from digital library collections
  2. Gaining insights from historical collections through Optical Character Recognition :
    Improved OCR methods to extract text and perform textual analysis and processing on a historical document collection.
  3. Automatic identification of relevant scholarly work for systematic reviews and meta analysis :
    AI and ML methods to automate classification of relevant work in systematic reviews following a more explainable and reproducible workflow.
  4. Understanding SDG contribution in scholarly work :
    AI and ML methods for filtering and classification of scholarly work based on the contributing SDG.

Resume

Work Experience

  1. Virginia Tech

    August, 2024 - Present

    AI Research Scientist, Assistant Professor, University Libraries.

  2. Virginia Tech

    August, 2019 — August, 2024

    Graduate Research Assistant

  3. ADP

    May, 2019 — August, 2019

    Global Product and Technology Intern

  4. Virginia Tech

    January, 2019- May, 2019

    Graduate Teaching Assistant

  5. Tata Consultancy Services

    April, 2016- November, 2017

    Assistant Systems Engineer

Education

  1. Virginia Tech

    2019 — 2024

    Ph.D. in Computer Science

  2. Virginia Tech

    2022

    M.S in Computer Science

  3. West Bengal University of Technology

    2015

    Bachelor of Technology in Computer Science and Applications

Research

Publications

  1. Optical Character Recognition for Pre-Digital Historical Documents using Large Language Models. 2025 Accepted

    Miller, C., & Banerjee, B. (Accepted).Optical Character Recognition for Pre-Digital Historical Documents using Large Language Models In 2025, The 29th International Conference on Machine Learning and Applications. on Theory and Practice of Digital Libraries. https://www.icmla-conference.org/icmla25/acceptedpapers.html/

  2. When LLMs Disagree: Diagnosing Relevance Filtering Bias and Retrieval Divergence in SDG Search. 2025 PDF

    Ingram, W. A., Banerjee, B., & Fox, E. A. (2025). When LLMs Disagree: Diagnosing Relevance Filtering Bias and Retrieval Divergence in SDG Search. LLM4Eval at SIGIR 2025 DOI: https://doi.org/10.48550/arXiv.2507.02139

  3. Evaluating Human-LLM Alignment in ETD Subject Classification. 2025 Accepted

    Klair, H., German, F., Banerjee, B., & Ingram, W. A. (Accepted).Evaluating Human-LLM Alignment in ETD Subject Classification. In 2025, The 29th International Conference on Theory and Practice of Digital Libraries. https://tpdl2025.github.io/

  4. Making History Readable. 2024 PDF

    Banerjee, B., Goyne, J., & Ingram, W. A. (2024, December). Making History Readable. In 2024 IEEE International Conference on Big Data (BigData) (pp. 8620-8622). IEEE. DOI: 10.1109/BigData62323.2024.10826028

  5. Automating Chapter-Level Classification for Electronic Theses and Dissertations. 2024 PDF

    Banerjee, B., Ingram, W. A., & Fox, E. A. (2024, December). Automating Chapter-Level Classification for Electronic Theses and Dissertations. In 2024 IEEE International Conference on Big Data (BigData) (pp. 2400-2409). IEEE. DOI: 10.1109/BigData62323.2024.10825418

  6. Agentic AI for Improving Precision in Identifying Contributions to Sustainable Development Goals. 2024 PDF

    7. Ingram, W. A., Banerjee, B., & Fox, E. A. (2024, December). Agentic AI for Improving Precision in Identifying Contributions to Sustainable Development Goals. In 2024 IEEE International Conference on Big Data (BigData) (pp. 8677-8679). IEEE. DOI: 10.1109/BigData62323.2024.10825072

  7. Integrated Digital Library System for Long Documents and their Elements. 2023 PDF

    S. Chekuri, P. Chandrasekar, B. Banerjee, S. Park, N. Masrourisaadat, A. Ahuja, W. A. Ingram, J. Wu and E. A. Fox, "An Integrated Digital Library System for Long Documents and their Elements," 2023 ACM/IEEE Joint Conference on Digital libraries".

  8. Applications of data analysis on scholarly long documents. 2022 PDF

    B. Banerjee, W. A. Ingram, J. Wu and E. A. Fox, "Applications of data analysis on scholarly long documents," 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 2022, pp. 2473-2481, doi: 10.1109/BigData55660.2022.10020935.

  9. Opening scholarly documents through text analytics. 2022PDF

    Bipasha Banerjee. 2022. Opening Scholarly Documents through Text Analytics. In Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries (Cologne, Germany) (JCDL 22). Association for Computing Machinery, New York, NY, USA, Article 47, 2 pages. https://doi.org/10.1145/3529372.3530948

  10. Building A Large Collection of Multi-domain Electronic Theses and Dissertations. 2021PDF

    S. Uddin, B. Banerjee, J. Wu, W. A. Ingram and E. A. Fox, "Building A Large Collection of Multi-domain Electronic Theses and Dissertations," 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 2021, pp. 6043-6045, doi: 10.1109/BigData52589.2021.9672058.

Presentations

  1. Preserving and Understanding the Past through Large Language Models. Generative AI in Libraries Conference. 2025

    Talk presented at the Generative AI in Libraries Conference. URL: https://shsulibraryguides.org/genailibraries/schedule.

  2. AI-driven Solutions for Digital Library Collections. 2025

    Talk presented at the Generative AI in Libraries Conference. URL: https://shsulibraryguides.org/genailibraries/schedule.

  3. Help Me Help You - A Mixed-Initiative Approach To Explore Book-length Documents. 2022

    Talk presented at CIKM 2022 Workshop on Human-in-the-loop Data Curation

  4. Applications of mining ETDs. 2021

    Presented at ETD 2021 conference.

  5. Extracting Information from Electronic Thesis and Dissertations. 2021

    Talk presented at the ACM Capital Region Celebration of Women (CAPWIC 2021)

News

Portfolio