非常AI探索全球1000+AI工具
数据管理
【罗格斯大学 邓栋】Unstructured Data Management at Scale for Large Language Models
In this talk, we discuss how to evaluate the LLM memorization behavior quantitively. For this purpose, we develop an efficient and scalable near-duplicate sequence search algorithm. Given a query sequence, it finds (almost) all the near-duplicate sequence