数据管理

【罗格斯大学邓栋】Unstructured Data Management at Scale for Large Language Models

In this talk, we discuss how to evaluate the LLM memorization behavior quantitively. For this purpose, we develop an efficient and scalable near-duplicate sequence search algorithm. Given a query sequence, it finds (almost) all the near-duplicate sequence

非常AI探索全球1000+AI工具