1
When Regex Fails: Using LLMs to Extract Structured Data from Messy Pages
正则表达式搞不定的脏数据,让LLM来当爬虫神器,2-3个示例就能搞定HTML到JSON的提取!
I’ve been doing web scraping for years. For most projects, I lean on BeautifulSoup, cssselect, and a handful of regex patterns. You know the drill: in…