非常AI探索全球1000+AI工具
语言模型
【罗格斯大学 邓栋】Unstructured Data Management at Scale for Large Language Models
In this talk, we discuss how to evaluate the LLM memorization behavior quantitively. For this purpose, we develop an efficient and scalable near-duplicate sequence search algorithm. Given a query sequence, it finds (almost) all the near-duplicate sequence