不管你是待業(yè)還是失業(yè),在這個(gè)被互聯(lián)網(wǎng)圍繞的時(shí)代里,選擇python進(jìn)行網(wǎng)絡(luò)爬蟲(chóng),就多了一項(xiàng)技能,還怕找不到工作?,還怕不好找工作?小編就來(lái)告訴你這個(gè)專業(yè)的優(yōu)勢(shì)到底體現(xiàn)在哪里:如何用python進(jìn)行網(wǎng)絡(luò)爬蟲(chóng),爬取淘寶商品價(jià)格信息并保存成txt的格式??。
1.如何用python進(jìn)行網(wǎng)絡(luò)爬蟲(chóng),爬取淘寶商品價(jià)格信息并保存成txt的格式
如何用python進(jìn)行網(wǎng)絡(luò)爬蟲(chóng),爬取淘寶商品價(jià)格信息并保存成txt的格式答:完整代碼: 復(fù)制代碼# coding: utf-8 import re # def (url):# try:# r = requests.get(url, timeout=30)# r.raise_for_status()# r.encoding = r.apparent_encoding# return r.text# except:# return ""### def parsePage(ilt, html):# try:# plt = re.findall(r'\"view_price\"\:\"[\d\.]*\"', html)# tlt = re.findall(r'\"raw_title\"\:\".*?\"', html)# for i in range(len(plt)):# price = eval(plt[i].split(':')[1])# title = eval(tlt[i].split(':')[1])# ilt.append([price, title])# except:# print()### def (ilt):# tplt = "{:4}\t{:8}\t{:16}"# print(tplt.format("序號(hào)", "價(jià)格", "商品名稱"))# count = 0# for t in ilt:# count = count + 1# print(tplt.format(count, t[0], t[1]))### def main():# goods = '高達(dá)'# depth = 3# start_url = ' + goods# infoList = []# for i in range(depth):# try:# url = start_url + '&s=' + str(44 * i)# html = (url)# parsePage(infoList, html)# except:# continue# (infoList)### main() def get_html(url): """獲取源碼html""" try: r = requests.get(url=url, timeout=10) r.encoding = r.apparent_encoding return r.text except: print("獲取失敗") def get_data(html, goodlist): """使用re庫(kù)解析商品名稱和價(jià)格 tlist:商品名稱列表 plist:商品價(jià)格列表""" tlist = re.findall(r'\"raw_title\"\:\".*?\"', html) plist = re.findall(r'\"view_price\"\:\"[\d\.]*\"', html) for i in range(len(tlist)): title = eval(tlist[i].split(':')[1]) # eval()函數(shù)簡(jiǎn)單說(shuō)就是用于去掉字符串的引號(hào) price = eval(plist[i].split(':')[1]) goodlist.append([title, price]) def write_data(list, num): # with open('E:/Crawler/case/taob2.txt', 'a') as data: # print(list, file=data) for i in range(num): # num控制把爬取到的商品寫進(jìn)多少到文本中 u = list[i] with open('E:/Crawler/case/taob.txt', 'a') as data: print(u, file=data) def main(): goods = '水杯' depth = 3 # 定義爬取深度,即翻頁(yè)處理 start_url = ' + goods infoList = [] for i in range(depth): try: url = start_url + '&s=' + str(44 * i) # 因?yàn)樘詫氾@示每頁(yè)44個(gè)商品,*頁(yè)i=0,一次遞增 html = get_html(url) get_data(html, infoList) except: continue write_data(infoList, len(infoList)) if __name__ == '__main__': main()
就拿大數(shù)據(jù)說(shuō)話,優(yōu)勢(shì)一目了然,從事IT行業(yè),打開(kāi)IT行業(yè)的新大門,找到適合自己的培訓(xùn)機(jī)構(gòu),進(jìn)行專業(yè)和系統(tǒng)的學(xué)習(xí)。