๋„ค์ด๋ฒ„ ์ง€๋„ ์…€๋ ˆ๋‹ˆ์›€์œผ๋กœ ํฌ๋กค๋ง ํ•ด๋ณด๊ธฐ

2024. 1. 12. 04:55ยท Python

https://www.youtube.com/watch?v=ElOAGrXZicQ

๋‚˜๋Š” ์›๋ž˜ ์…€๋ ˆ๋‹ˆ์›€์œผ๋กœ ํฌ๋กค๋ง ํ•˜๋Š” ๊ฒƒ์„ ์ข‹์•„ํ•œ๋‹ค ๊ทธ๋ž˜์„œ 2๋…„์ „์—๋„ ์ธ์Šคํƒ€๊ทธ๋žจ์„ ์…€๋ ˆ๋‹ˆ์›€์„ ํ†ตํ•ด์„œ

ํŒ”๋กœ์šฐ ๋ด‡์„ ๋งŒ๋“ค์—ˆ๊ณ  3000๋ช…๊นŒ์ง€ ํŒ”๋กœ์šฐ๋ฅผ ๋Š˜๋ ธ์—ˆ๋‹ค.

๊ทธ ํ›„๋กœ๋Š” ์…€๋ ˆ๋‹ˆ์›€์„ ๋‹ค์‹œ ์‚ฌ์šฉํ•  ์ผ์€ ์—†๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ๋Š”๋ฐ

์ด๋ฒˆ์— ๊ฐ™์€ ๋Œ€ํ•™๊ต ํŒ€์›๋“ค๊ณผ ์ƒˆ๋กœ์šด ํ”„๋กœ์ ํŠธ๋ฅผ ์ง„ํ–‰ํ•˜๊ฒŒ ๋˜๋ฉด์„œ

๋„ค์ด๋ฒ„ ์ง€๋„์— ์žˆ๋Š” ์Œ์‹์  ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์™€์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค์‹œ ๋งŒ์ ธ๋ณด๊ฒŒ ๋˜์—ˆ๋‹ค.

 

๋งŒ์•ฝ ์ง€๊ธˆ ์ด ๊ธ€์„ ๋ณด๊ณ  ์žˆ๋Š” ๋‹น์‹ ์ด ์…€๋ ˆ๋‹ˆ์›€ ์ดˆ๋ณด๋ผ๋ฉด ์•„์ฃผ ์ข‹๋‹ค.

๋‚ด๊ฐ€ ์ฐจ๊ทผ์ฐจ๊ทผ ์ฒ˜์Œ๋ถ€ํ„ฐ ์–ด๋–ป๊ฒŒ ์‚ฌ์šฉํ•˜๋ฉด ์ค‘๊ธ‰ ์ฝ”์Šค๊นŒ์ง€ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์„์ง€ ์•„์ฃผ ์‰ฝ๊ฒŒ ์„ค๋ช…์„ ํ•ด์ฃผ๊ฒ ๋‹ค

 

์ผ๋‹จ ๋„ค์ด๋ฒ„ ์ง€๋„๋ฅผ ์ผœ๋‘๊ณ  ์–ด๋–ค ๋ฐ์ดํ„ฐ๋ฅผ ํฌ๋กค๋ง ํ• ์ง€ ๋ฏธ๋ฆฌ ์ƒ๊ฐํ•˜๊ณ  ์ •๋ฆฌ๋ฅผ ํ•ด์•ผ ํ•œ๋‹ค.

๋‚˜ ๊ฐ™์€๊ฒฝ์šฐ์—๋Š” ์ฃผ๋กœ ์บก์ณ๋ฅผ ํ•ด๋‘๊ณ  ๋ฐ‘์— ์ฒ˜๋Ÿผ ํ•œ๋ˆˆ์— ๋ณด๊ธฐ ์‰ฝ๋„๋ก ๊ทธ๋ ค๋‘๋Š” ์ž‘์—…์„ ํ•œ๋‹ค.

 

1. ๊ฐ€๊ฒŒ ์ด๋ฆ„
2. ์นดํ…Œ ๊ณ ๋ฆฌ
3. ํ‰์ 
4. ๋ฐฉ๋ฌธ์ž ๋ฆฌ๋ทฐ / ๋ธ”๋กœ๊ทธ ๋ฆฌ๋ทฐ
5. ์˜์—… ์‹œ๊ฐ„
6. ์˜์—… ์‹œ๊ฐ„

๋‚˜๋Š” ์ด๋ ‡๊ฒŒ ๊ฐ€์ ธ์˜ค๊ธฐ๋กœ ๊ฒฐ์ •ํ–ˆ๋‹ค.

๋‚ด๊ฐ€ ๋งŒ๋“œ๋Š” ํ”„๋กœ์ ํŠธ๊ฐ€ ๋Œ€ํ•™๊ต ๊ทผ์ฒ˜๋‚ด์— ์žˆ๋Š” ์Œ์‹์ ์„ ๋žœ๋คํ•˜๊ฒŒ ์ถ”์ฒœํ•ด์„œ ๊ฐ„๋žตํ•˜๊ฒŒ ๋ณด์—ฌ์ฃผ๋Š” ์•ฑ์ด๊ธฐ์— ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ๋งŒ ๊ฐ„์ถ”๋ ธ๋‹ค.

 

์ž! ์ด์ œ ๊ฐ€์ ธ์™€์•ผํ•˜๋Š” ๋ฐ์ดํ„ฐ๋Š” ์•Œ์•˜์œผ๋‹ˆ ์ด์ œ ํฌ๋กค๋ง์„ ์‹œ์ž‘ํ•ด๋ณด์ž!

 


 

๊ทผ๋ฐ ๋„ค์ด๋ฒ„ ์ง€๋„์— ์žฌ๋ฐŒ๋Š”์ ์ด ๊ฐ€๊ฒŒ ๋ชฉ๋ก์ด ๋‚˜์˜ค๋Š” List ๋ถ€๋ถ„์ด iframe์œผ๋กœ ๋งŒ๋“ค์–ด์ ธ ์žˆ๋‹ค.

๊ทธ๋ž˜์„œ ๊ทธ๋ƒฅ xpath๋กœ ๋ƒ…๋‹ค ๋•Œ๋ ค๋ฐ•์œผ๋ฉด noSuchElement ์˜ค๋ฅ˜๊ฐ€ ์‹œ์›ํ•˜๊ฒŒ ๋‚˜์˜จ๋‹ค.

๊ฐœ๋ฐœ์ž ๋ชจ๋“œ๋กœ ํ™•์ธํ•ด๋ดค์„๋•Œ iframe์œผ๋กœ ๊ฐ์‹ธ์ ธ ์žˆ์Œ

์˜ค์ž‰? ๊ทธ๋Ÿฌ๋ฉด ์–ด๋–ป๊ฒŒ ํ•˜์ง€? ๋ผ๊ณ  ์ƒ๊ฐ์ด ๋“ ๋‹ค๋ฉด ๋ฐฉ๋ฒ•์€ ๊ฐ„๋‹จํ•˜๋‹ค

๊ทธ๋ƒฅ driver๊ฐ€ ๊ฐ€๋ฆฌํ‚ค๋Š” frame์„ ์Šค์œ„์น˜ ํ•˜๋ฉด ๋์ด๋‹ค.

def switch_left():
############## iframe์œผ๋กœ ์™ผ์ชฝ ํฌ์ปค์Šค ๋งž์ถ”๊ธฐ ##############
driver.switch_to.parent_frame()
iframe = driver.find_element(By.XPATH,'//*[@id="searchIframe"]')
driver.switch_to.frame(iframe)
def switch_right():
############## iframe์œผ๋กœ ์˜ค๋ฅธ์ชฝ ํฌ์ปค์Šค ๋งž์ถ”๊ธฐ ##############
driver.switch_to.parent_frame()
iframe = driver.find_element(By.XPATH,'//*[@id="entryIframe"]')
driver.switch_to.frame(iframe)

๋‚˜ ๊ฐ™์€ ๊ฒฝ์šฐ๋Š” ์Šค์œ„์นญํ•  ์ผ์ด ๋งŽ์„ ๊ฒƒ ๊ฐ™์•„์„œ ์ด๋ ‡๊ฒŒ 2๊ฐœ์˜ ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค์–ด์„œ ๋ฐ”๋กœ๋ฐ”๋กœ ์‚ฌ์šฉํ•˜๋„๋ก ๋งŒ๋“ค์–ด ๋†จ๋‹ค.

์™œ 2๊ฐœ๊ฐ€ ํ•„์š”ํ•˜๋ƒ๋ฉด ๊ฐ€๊ฒŒ๋ฅผ ๋ˆŒ๋ €์„๋•Œ ์˜ค๋ฅธ์ชฝ์— ๊ฐ€๊ฒŒ์— ๋Œ€ํ•œ ์ƒ์„ธ ์ •๋ณด๊ฐ€ ๋‚˜์˜ค๋Š”๋ฐ ์ด ๋ถ€๋ถ„๋„ iframe์œผ๋กœ ๊ฐ์‹ธ์ ธ์žˆ๊ธฐ์—

ํฌ๋กค๋ง์„ ํ• ๋•Œ ๋‘๊ฐœ๋ฅผ ์ „ํ™˜ํ•ด ๊ฐ€๋ฉด์„œ ํฌ๋กค๋ง์„ ํ•ด์•ผ ํ•œ๋‹ค.

 

์˜ค์ผ€์ด! ์ผ๋‹จ ํฌ๋กค๋งํ•˜๋ฉด์„œ ์•Œ์•„๋‘ฌ์•ผํ•  ์ฃผ์˜์‚ฌํ•ญ์€ ํ™•์ธํ–ˆ์œผ๋‹ˆ ๋ณธ๊ฒฉ์ ์œผ๋กœ ๋‚ด๊ฐ€ ์ž‘์„ฑํ•œ ์ฝ”๋“œ๋ฅผ ํŒŒํ—ค์ณ๋ณด์ž!

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
from time import sleep
import random
import re
from selenium import webdriver
import sys
options = webdriver.ChromeOptions()
options.add_argument('user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3')
options.add_argument('window-size=1380,900')
driver = webdriver.Chrome(options=options)
# ๋Œ€๊ธฐ ์‹œ๊ฐ„
driver.implicitly_wait(time_to_wait=3)
# ๋ฐ˜๋ณต ์ข…๋ฃŒ ์กฐ๊ฑด
loop = True
URL = 'https://map.naver.com/p/search/%EC%9D%B8%ED%95%98%EB%8C%80%20%EC%9D%8C%EC%8B%9D%EC%A0%90?c=15.23,0,0,0,dh'
driver.get(url=URL)
while(True):
switch_left()
# ํŽ˜์ด์ง€ ์ˆซ์ž๋ฅผ ์ดˆ๊ธฐ์— ์ฒดํฌ [ True / False ]
# ์ด๊ฑด ํŽ˜์ด์ง€ ๋„˜์–ด๊ฐˆ๋•Œ๋งˆ๋‹ค ๊ณ„์† ํ™•์ธํ•ด์ค˜์•ผ ํ•จ (ํŽ˜์ด์ง€ ์ƒˆ๋กœ ๋กœ๋“œ ๋ ๋•Œ๋งˆ๋‹ค ๋ฒ„ํŠผ ์ƒํƒœ ๊ฐ’์ด ๋ฐ”๋€œ)
next_page = driver.find_element(By.XPATH,'//*[@id="app-root"]/div/div[3]/div[2]/a[7]').get_attribute('aria-disabled')
if(next_page == 'true'):
break
############## ๋งจ ๋ฐ‘๊นŒ์ง€ ์Šคํฌ๋กค ##############
scrollable_element = driver.find_element(By.CLASS_NAME, "Ryr1F")
last_height = driver.execute_script("return arguments[0].scrollHeight", scrollable_element)
while True:
# ์š”์†Œ ๋‚ด์—์„œ ์•„๋ž˜๋กœ 600px ์Šคํฌ๋กค
driver.execute_script("arguments[0].scrollTop += 600;", scrollable_element)
# ํŽ˜์ด์ง€ ๋กœ๋“œ๋ฅผ ๊ธฐ๋‹ค๋ฆผ
sleep(1) # ๋™์  ์ฝ˜ํ…์ธ  ๋กœ๋“œ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ์กฐ์ ˆ
# ์ƒˆ ๋†’์ด ๊ณ„์‚ฐ
new_height = driver.execute_script("return arguments[0].scrollHeight", scrollable_element)
# ์Šคํฌ๋กค์ด ๋” ์ด์ƒ ๋Š˜์–ด๋‚˜์ง€ ์•Š์œผ๋ฉด ๋ฃจํ”„ ์ข…๋ฃŒ
if new_height == last_height:
break
last_height = new_height
############## ํ˜„์žฌ page number ๊ฐ€์ ธ์˜ค๊ธฐ - 1 ํŽ˜์ด์ง€ ##############
page_no = driver.find_element(By.XPATH,'//a[contains(@class, "mBN2s qxokY")]').text
# ํ˜„์žฌ ํŽ˜์ด์ง€์— ๋“ฑ๋ก๋œ ๋ชจ๋“  ๊ฐ€๊ฒŒ ์กฐํšŒ
# ์ฒซํŽ˜์ด์ง€ ๊ด‘๊ณ  2๊ฐœ ๋•Œ๋ฌธ์— ์ฒซํŽ˜์ด์ง€๋Š” ์•ž 2๊ฐœ๋ฅผ ๋นผ์•ผํ•จ
if(page_no == '1'):
elemets = driver.find_elements(By.XPATH,'//*[@id="_pcmap_list_scroll_container"]//li')[2:]
else:
elemets = driver.find_elements(By.XPATH,'//*[@id="_pcmap_list_scroll_container"]//li')
print('ํ˜„์žฌ ' + '\033[95m' + str(page_no) + '\033[0m' + ' ํŽ˜์ด์ง€ / '+ '์ด ' + '\033[95m' + str(len(elemets)) + '\033[0m' + '๊ฐœ์˜ ๊ฐ€๊ฒŒ๋ฅผ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค.\n')
for index, e in enumerate(elemets, start=1):
final_element = e.find_element(By.CLASS_NAME,'CHC5F').find_element(By.XPATH, ".//a/div/div/span")
print(str(index) + ". " + final_element.text)
print(Colors.RED + "-"*50 + Colors.RESET)
switch_left()
sleep(2)
for index, e in enumerate(elemets, start=1):
store_name = '' # ๊ฐ€๊ฒŒ ์ด๋ฆ„
category = '' # ์นดํ…Œ๊ณ ๋ฆฌ
new_open = '' # ์ƒˆ๋กœ ์˜คํ”ˆ
rating = 0.0 # ํ‰์ 
visited_review = 0 # ๋ฐฉ๋ฌธ์ž ๋ฆฌ๋ทฐ
blog_review = 0 # ๋ธ”๋กœ๊ทธ ๋ฆฌ๋ทฐ
store_id = '' # ๊ฐ€๊ฒŒ ๊ณ ์œ  ๋ฒˆํ˜ธ
address = '' # ๊ฐ€๊ฒŒ ์ฃผ์†Œ
business_hours = [] # ์˜์—… ์‹œ๊ฐ„
phone_num = '' # ์ „ํ™”๋ฒˆํ˜ธ
switch_left()
# ์ˆœ์„œ๋Œ€๋กœ ๊ฐ’์„ ํ•˜๋‚˜์”ฉ ํด๋ฆญ
e.find_element(By.CLASS_NAME,'CHC5F').find_element(By.XPATH, ".//a/div/div/span").click()
sleep(2)
switch_right()
################### ์—ฌ๊ธฐ๋ถ€ํ„ฐ ํฌ๋กค๋ง ์‹œ์ž‘ ##################
title = driver.find_element(By.XPATH,'//div[@class="zD5Nm undefined"]')
store_info = title.find_elements(By.XPATH,'//div[@class="YouOG DZucB"]/div/span')
# ๊ฐ€๊ฒŒ ์ด๋ฆ„
store_name = title.find_element(By.XPATH,'.//div[1]/div[1]/span[1]').text
# ์นดํ…Œ๊ณ ๋ฆฌ
category = title.find_element(By.XPATH,'.//div[1]/div[1]/span[2]').text
if(len(store_info) > 2):
# ์ƒˆ๋กœ ์˜คํ”ˆ
new_open = title.find_element(By.XPATH,'.//div[1]/div[1]/span[3]').text
###############################
review = title.find_elements(By.XPATH,'.//div[2]/span')
# ์ธ๋ฑ์Šค ๋ณ€์ˆ˜ ๊ฐ’
_index = 1
# ๋ฆฌ๋ทฐ ROW์˜ ๊ฐฏ์ˆ˜๊ฐ€ 3๊ฐœ ์ด์ƒ์ผ ๊ฒฝ์šฐ [๋ณ„์ , ๋ฐฉ๋ฌธ์ž ๋ฆฌ๋ทฐ, ๋ธ”๋กœ๊ทธ ๋ฆฌ๋ทฐ]
if len(review) > 2:
rating_xpath = f'.//div[2]/span[{_index}]'
rating_element = title.find_element(By.XPATH, rating_xpath)
rating = rating_element.text.replace("\n", " ")
_index += 1
try:
# ๋ฐฉ๋ฌธ์ž ๋ฆฌ๋ทฐ
visited_review = title.find_element(By.XPATH,f'.//div[2]/span[{_index}]/a').text
# ์ธ๋ฑ์Šค๋ฅผ ๋‹ค์‹œ +1 ์ฆ๊ฐ€ ์‹œํ‚ด
_index += 1
# ๋ธ”๋กœ๊ทธ ๋ฆฌ๋ทฐ
blog_review = title.find_element(By.XPATH,f'.//div[2]/span[{_index}]/a').text
except:
print(Colors.RED + '------------ ๋ฆฌ๋ทฐ ๋ถ€๋ถ„ ์˜ค๋ฅ˜ ------------' + Colors.RESET)
# ๊ฐ€๊ฒŒ id
store_id = driver.find_element(By.XPATH,'//div[@class="flicking-camera"]/a').get_attribute('href').split('/')[4]
# ๊ฐ€๊ฒŒ ์ฃผ์†Œ
address = driver.find_element(By.XPATH,'//span[@class="LDgIH"]').text
try:
driver.find_element(By.XPATH,'//div[@class="y6tNq"]//span').click()
# ์˜์—… ์‹œ๊ฐ„ ๋”๋ณด๊ธฐ ๋ฒ„ํŠผ์„ ๋ˆ„๋ฅด๊ณ  2์ดˆ ๋ฐ˜์˜์‹œ๊ฐ„ ๊ธฐ๋‹ค๋ฆผ
sleep(2)
parent_element = driver.find_element(By.XPATH,'//a[@class="gKP9i RMgN0"]')
child_elements = parent_element.find_elements(By.XPATH, './*[@class="w9QyJ" or @class="w9QyJ undefined"]')
for child in child_elements:
# ๊ฐ ์ž์‹ ์š”์†Œ ๋‚ด์—์„œ ํด๋ž˜์Šค๊ฐ€ 'A_cdD'์ธ span ์š”์†Œ ์ฐพ๊ธฐ
span_elements = child.find_elements(By.XPATH, './/span[@class="A_cdD"]')
# ์ฐพ์€ span ์š”์†Œ๋“ค์˜ ํ…์ŠคํŠธ ์ถœ๋ ฅ
for span in span_elements:
business_hours.append(span)
# ๊ฐ€๊ฒŒ ์ „ํ™”๋ฒˆํ˜ธ
phone_num = driver.find_element(By.XPATH,'//span[@class="xlx7Q"]').text
except:
print(print(Colors.RED + '------------ ์˜์—…์‹œ๊ฐ„ / ์ „ํ™”๋ฒˆํ˜ธ ๋ถ€๋ถ„ ์˜ค๋ฅ˜ ------------' + Colors.RESET))
print(Colors.BLUE + f'{index}. ' + str(store_name) + Colors.RESET + ' ยท ' + str(category) + Colors.RED + str(new_open) + Colors.RESET)
print('ํ‰์  ' + Colors.RED + str(rating) + Colors.RESET + ' / ' + visited_review + ' ยท ' + blog_review)
print(f'๊ฐ€๊ฒŒ ๊ณ ์œ  ๋ฒˆํ˜ธ -> {store_id}')
print('๊ฐ€๊ฒŒ ์ฃผ์†Œ ' + Colors.GREEN + str(address) + Colors.RESET)
print(Colors.CYAN + '๊ฐ€๊ฒŒ ์˜์—… ์‹œ๊ฐ„' + Colors.RESET)
for i in business_hours:
print(i.text)
print('')
print('๊ฐ€๊ฒŒ ๋ฒˆํ˜ธ ' + Colors.GREEN + phone_num + Colors.RESET)
print(Colors.MAGENTA + "-"*50 + Colors.RESET)
switch_left()
# ํŽ˜์ด์ง€ ๋‹ค์Œ ๋ฒ„ํŠผ์ด ํ™œ์„ฑํ™” ์ƒํƒœ์ผ ๊ฒฝ์šฐ ๊ณ„์† ์ง„ํ–‰
if(next_page == 'false'):
driver.find_element(By.XPATH,'//*[@id="app-root"]/div/div[3]/div[2]/a[7]').click()
# ์•„๋‹ ๊ฒฝ์šฐ ๋ฃจํ”„ ์ •์ง€
else:
loop = False

 

์ „์ฒด์ฝ”๋“œ๋งŒ ๋ณด๋ฉด ์ „ํ˜€ ์ดํ•ด๊ฐ€ ์•ˆ๋  ๊ฒƒ ์ด๋‹ค. ์‚ฌ์‹ค ๋‚ด๊ฐ€ ๋ด๋„ ์ดํ•ด๊ฐ€ ์ž˜ ์•ˆ๋œ๋‹ค.

ํ•˜๋‚˜์”ฉ ์ฐจ๊ทผ์ฐจ๊ทผ ์–ด๋–ค ๋ฐฉ์‹์œผ๋กœ ๋™์ž‘์ด ๋˜๋Š”์ง€ ์„ค๋ช…ํ•ด ๋ณด๊ฒ ๋‹ค.

1. ๋„ค์ด๋ฒ„ ์ง€๋„ URL ์ ‘์†
2. ํ˜„์žฌ ํŽ˜์ด์ง€ ๋ชฉ๋ก์— ์žˆ๋Š” ๋ชจ๋“  ๊ฐ€๊ฒŒ ๋ฐ์ดํ„ฐ ์ •๋ณด๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค.
3. ํ•˜๋‚˜์”ฉ ๊ฐ€๊ฒŒ๋ฅผ ํด๋ฆญํ•˜์—ฌ ์ƒ์„ธ์ •๋ณด ํŽ˜์ด์ง€์—์„œ ํ•„์š”ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ํฌ๋กค๋ง ํ•œ๋‹ค.
4. ํ˜„์žฌ ํŽ˜์ด์ง€์— ์žˆ๋Š” ๋ชจ๋“  ๋ชฉ๋ก์„ ํฌ๋กค๋งํ–ˆ๋‹ค๋ฉด ๋‹ค์Œ ํŽ˜์ด์ง€๋กœ ๋„˜์–ด๊ฐ€์„œ ๋‹ค์‹œ ๋ฐ˜๋ณตํ•œ๋‹ค.
5. ๋ชจ๋“  ๊ณผ์ •์ด ๋๋‚˜๋ฉด ์ข…๋ฃŒ!

ํฌ๊ฒŒ ๋ณด์ž๋ฉด ์ด๋ ‡๊ฒŒ ๊ฐ„๋žตํ•˜๊ฒŒ ์š”์•ฝ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. 

 

์Œ...! ๊ทธ๋ ‡๊ตฐ ๋ฐ”๋กœ ์ฝ”๋“œ๋กœ ํ•˜๋‚˜์”ฉ ์„ค๋ช…์„ ํ•ด๋ณด์ž!

URL = 'https://map.naver.com/p/search/%EC%9D%B8%ED%95%98%EB%8C%80%20%EC%9D%8C%EC%8B%9D%EC%A0%90?c=15.23,0,0,0,dh'
driver.get(url=URL)

๋„ค์ด๋ฒ„ ์ง€๋„ URL์„ ์—ด์–ด์„œ ํฌ๋กฌ๋“œ๋ผ์ด๋ธŒ๋ฅผ ํ†ตํ•ด ์ œ์–ด๊ฐ€ ๊ฐ€๋Šฅํ•œ ์›น๋ธŒ๋ผ์šฐ์ €๋ฅผ ์‹คํ–‰ํ•œ๋‹ค

switch_left()
# ํŽ˜์ด์ง€ ์ˆซ์ž๋ฅผ ์ดˆ๊ธฐ์— ์ฒดํฌ [ True / False ]
# ์ด๊ฑด ํŽ˜์ด์ง€ ๋„˜์–ด๊ฐˆ๋•Œ๋งˆ๋‹ค ๊ณ„์† ํ™•์ธํ•ด์ค˜์•ผ ํ•จ (ํŽ˜์ด์ง€ ์ƒˆ๋กœ ๋กœ๋“œ ๋ ๋•Œ๋งˆ๋‹ค ๋ฒ„ํŠผ ์ƒํƒœ ๊ฐ’์ด ๋ฐ”๋€œ)
next_page = driver.find_element(By.XPATH,'//*[@id="app-root"]/div/div[3]/div[2]/a[7]').get_attribute('aria-disabled')
if(next_page == 'true'):
break

๊ทธ ๋‹ค์Œ์œผ๋กœ๋Š” ์ดˆ๋ฐ˜์— ์„ค๋ช…ํ–ˆ๋“ฏ์ด ์›ํ•˜๋Š” iframe์œผ๋กœ ํฌ์ปค์‹ฑ์„ ๋งž์ถ˜ ํ›„

ํ˜„์žฌ ํŽ˜์ด์ง€์˜ ๊ฐ’์„ ๊ฐ€์ ธ์™€์•ผ ํ•œ๋‹ค. ์ด ๋ถ€๋ถ„์€ ๋ฌด์—‡์ด๋ƒ๋ฉด...

๋„ค์ด๋ฒ„ ์ง€๋„์— ๋ฆฌ์ŠคํŠธ ๋ชฉ๋ก์˜ ํŽ˜์ด์ง€ ๋ฒˆํ˜ธ์ด๋‹ค.

๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์™€์•ผ ํ•˜๊ธฐ๋•Œ๋ฌธ์— ํ˜„์žฌ ํŽ˜์ด์ง€ ๋ฒˆํ˜ธ๋ฅผ ๋‹ค์Œ์œผ๋กœ ๊ณ„์† ๋„˜๊ธฐ๋ฉด์„œ ํฌ๋กค๋ง์„ ํ•ด์•ผ ํ•œ๋‹ค.

############## ๋งจ ๋ฐ‘๊นŒ์ง€ ์Šคํฌ๋กค ##############
scrollable_element = driver.find_element(By.CLASS_NAME, "Ryr1F")
last_height = driver.execute_script("return arguments[0].scrollHeight", scrollable_element)
while True:
# ์š”์†Œ ๋‚ด์—์„œ ์•„๋ž˜๋กœ 600px ์Šคํฌ๋กค
driver.execute_script("arguments[0].scrollTop += 600;", scrollable_element)
# ํŽ˜์ด์ง€ ๋กœ๋“œ๋ฅผ ๊ธฐ๋‹ค๋ฆผ
sleep(1) # ๋™์  ์ฝ˜ํ…์ธ  ๋กœ๋“œ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ์กฐ์ ˆ
# ์ƒˆ ๋†’์ด ๊ณ„์‚ฐ
new_height = driver.execute_script("return arguments[0].scrollHeight", scrollable_element)
# ์Šคํฌ๋กค์ด ๋” ์ด์ƒ ๋Š˜์–ด๋‚˜์ง€ ์•Š์œผ๋ฉด ๋ฃจํ”„ ์ข…๋ฃŒ
if new_height == last_height:
break
last_height = new_height

๊ทธ ๋‹ค์Œ์œผ๋กœ ์ง„ํ–‰ํ•  ๊ณณ์€ ๋ฐ”๋กœ ์Šคํฌ๋กค ๋ถ€๋ถ„์ด๋‹ค.

์Šคํฌ๋กค...? ์–ด๋””๋ฅผ ์Šคํฌ๋กค ํ•œ๋‹ค๋Š”๊ฑฐ์ง€ ์ƒ๊ฐ ํ•  ์ˆ˜ ์žˆ๋Š”๋ฐ

์—ฌ๊ธฐ ์—ฌ๊ธฐ ์ด ๋ถ€๋ถ„ ์Šคํฌ๋กค ํ•ด์•ผ ํ•œ๋‹ค.

๊ทผ๋ฐ ์กฐ๊ธˆ ๊นŒ๋‹ค๋กœ์šด ์ ์ด ํ˜น์‹œ ์ธ์Šคํƒ€๊ทธ๋žจ ํฌ๋กค๋ง ํ•ด๋ณธ์‚ฌ๋žŒ์€ ์•Œ์ง€ ๋ชจ๋ฅด๊ฒ ๋Š”๋ฐ

lazy-loading์ด๋ผ๊ณ  ํ•ด์„œ ์‹ค์ œ ์‚ฌ์šฉ์ž๊ฐ€ ์Šคํฌ๋กค์„ ํ• ๋•Œ๋งˆ๋‹ค ๊ทธ๋•Œ ๋งˆ๋‹ค ์ถ”๊ฐ€์ ์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์™€์„œ

ํ™”๋ฉด์— ๋„์›Œ์ฃผ๋Š” ๋ฐฉ์‹์ด๋ผ ์กฐ๊ธˆ ๊นŒ๋‹ค๋กญ๋‹ค.

๊ทธ๋ž˜์„œ 600px๋งŒํผ ์Šคํฌ๋กค์„ ์ผ๋‹จ ๋‚ด๋ฆฌ๊ณ  ์ดˆ๊ธฐ ๋†’์ด / ์ƒˆ ๋†’์ด๋ฅผ ๊ณ„์‚ฐ ํ•˜์—ฌ ๋น„๊ตํ•˜๋ฉด์„œ

๋งˆ์ง€๋ง‰๊นŒ์ง€ ๋„๋‹ฌ ํ–ˆ์„๋•Œ ์Šคํฌ๋กค์ด ์ข…๋ฃŒ๋˜๋„๋ก ๊ตฌํ˜„ํ•˜์˜€๋‹ค. (์ž์„ธํ•œ ๋ถ€๋ถ„์€ ์ฝ”๋“œ ์ฐธ๊ณ )

 

์Œ... ๊ทธ๋ž˜์„œ ์ผ๋‹จ ์Šคํฌ๋กค์ด ์ •์ƒ์ ์œผ๋กœ ๋๊นŒ์ง€ ์™„๋ฃŒ๊ฐ€ ๋˜๋ฉด

ํ˜„์žฌ ํŽ˜์ด์ง€์— ์žˆ๋Š” ๋ชจ๋“  ๊ฐ€๊ฒŒ๋“ค์˜ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ „๋ถ€ ๋ถˆ๋Ÿฌ์˜ฌ ์ˆ˜ ๊ฐ€ ์žˆ๊ฒŒ ๋œ๋‹ค. (๋งŒ์„ธ~)

์ž ์ด์ œ ๊ทธ ๋‹ค์Œ์œผ๋กœ๋Š”

############## ํ˜„์žฌ page number ๊ฐ€์ ธ์˜ค๊ธฐ - 1 ํŽ˜์ด์ง€ ##############=
page_no = driver.find_element(By.XPATH,'//a[contains(@class, "mBN2s qxokY")]').text
# ํ˜„์žฌ ํŽ˜์ด์ง€์— ๋“ฑ๋ก๋œ ๋ชจ๋“  ๊ฐ€๊ฒŒ ์กฐํšŒ
# ์ฒซํŽ˜์ด์ง€ ๊ด‘๊ณ  2๊ฐœ ๋•Œ๋ฌธ์— ์ฒซํŽ˜์ด์ง€๋Š” ์•ž 2๊ฐœ๋ฅผ ๋นผ์•ผํ•จ
if(page_no == '1'):
elemets = driver.find_elements(By.XPATH,'//*[@id="_pcmap_list_scroll_container"]//li')[2:]
else:
elemets = driver.find_elements(By.XPATH,'//*[@id="_pcmap_list_scroll_container"]//li')
print('ํ˜„์žฌ ' + '\033[95m' + str(page_no) + '\033[0m' + ' ํŽ˜์ด์ง€ / '+ '์ด ' + '\033[95m' + str(len(elemets)) + '\033[0m' + '๊ฐœ์˜ ๊ฐ€๊ฒŒ๋ฅผ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค.\n')
for index, e in enumerate(elemets, start=1):
final_element = e.find_element(By.CLASS_NAME,'CHC5F').find_element(By.XPATH, ".//a/div/div/span")
print(str(index) + ". " + final_element.text)
print(Colors.RED + "-"*50 + Colors.RESET)
switch_left()
sleep(2)

๋Œ€์ถฉ ์ด๋Ÿฐ์‹์œผ๋กœ ๋ชจ๋“  ๋ฆฌ์ŠคํŠธ์— ๋Œ€ํ•œ ๊ฐ€๊ฒŒ ์ด๋ฆ„์„ ์ญ‰ ๊ฐ€์ ธ์™€์„œ ํ•œ๋ˆˆ์— ๋ณด๊ธฐ ์ข‹๋„๋ก ๋งŒ๋“ค์—ˆ๋‹ค.

๋Œ€์ถฉ ์ด๋ ‡๊ฒŒ ๋‚˜์˜ค๊ฒŒ ๋งŒ๋“ฌ

 

์ด์ œ ๊ทธ ๋‹ค์Œ์œผ๋กœ๋Š” ๋ถˆ๋Ÿฌ์˜จ ๋ชจ๋“  ๋ฆฌ์ŠคํŠธ๋ฅผ ๊ฐ ํ•˜๋‚˜์”ฉ ํด๋ฆญํ•˜๋ฉด์„œ ์ƒ์„ธ ํŽ˜์ด์ง€์—์„œ ๋‚ด๊ฐ€ ์›ํ•˜๋Š” ๊ฐ’๋งŒ ๊ฐ€์ ธ์˜ค๋ฉด ๋์ด๋‹ค.

for index, e in enumerate(elemets, start=1):
...
...
# ์ˆœ์„œ๋Œ€๋กœ ๊ฐ’์„ ํ•˜๋‚˜์”ฉ ํด๋ฆญ
e.find_element(By.CLASS_NAME,'CHC5F').find_element(By.XPATH, ".//a/div/div/span").click()
# ํ”„๋ ˆ์ž„ ์ „ํ™˜ (์™œ๋ƒํ•˜๋ฉด ์˜ค๋ฅธ์ชฝ์— ์ƒ์„ธ์ •๋ณด ์ฐฝ์ด ์ƒˆ๋กœ ์ƒ๊ฒผ๊ธฐ ๋•Œ๋ฌธ)
switch_right()
...
...
# ๊ฒฐ๊ณผ๋ฌผ ์ถœ๋ ฅ
print(Colors.BLUE + f'{index}. ' + str(store_name) + Colors.RESET + ' ยท ' + str(category) + Colors.RED + str(new_open) + Colors.RESET)
print('ํ‰์  ' + Colors.RED + str(rating) + Colors.RESET + ' / ' + visited_review + ' ยท ' + blog_review)
print(f'๊ฐ€๊ฒŒ ๊ณ ์œ  ๋ฒˆํ˜ธ -> {store_id}')
print('๊ฐ€๊ฒŒ ์ฃผ์†Œ ' + Colors.GREEN + str(address) + Colors.RESET)
print(Colors.CYAN + '๊ฐ€๊ฒŒ ์˜์—… ์‹œ๊ฐ„' + Colors.RESET)
for i in business_hours:
print(i.text)
print('')
print('๊ฐ€๊ฒŒ ๋ฒˆํ˜ธ ' + Colors.GREEN + phone_num + Colors.RESET)
print(Colors.MAGENTA + "-"*50 + Colors.RESET)
...
...

 

์ฝ”๋“œ์˜ ๋‚˜๋จธ์ง€ ๋ถ€๋ถ„์€ ๋Œ€์ถฉ ์ฃผ์„๋งŒ ์ฝ์–ด๋„ ์ดํ•ด๊ฐ€ ๋œ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๊ธฐ ๋•Œ๋ฌธ์—

์ค‘์š”ํ•œ ๋ถ€๋ถ„ํ•œ ๊ฐ„์ถ”๋ ค์„œ ์ ์–ด๋ดค๋‹ค.

์‚ฌ์‹ค ์–ด๋ ค์šด ๋ถ€๋ถ„์€ ์—†๋Š”๋ฐ ์…€๋ ˆ๋‹ˆ์›€์˜ 80% ์˜ค๋ฅ˜๋Š”

1. sleep ๊ฐ’์„ ์ ์ ˆํžˆ ์ฃผ์ง€ ์•Š์•˜์„ ๊ฒฝ์šฐ -> ํŽ˜์ด์ง€๊ฐ€ ์ œ๋Œ€๋กœ ๋กœ๋“œ๊ฐ€ ์•ˆ๋๋Š”๋ฐ ์ฐพ์œผ๋ผ๊ณ  ํ•˜๋‹ˆ ์˜ค๋ฅ˜ ๋ฑ‰์Œ...
2. find_element ๋ฅผ ์ œ๋Œ€๋กœ ํ•˜์ง€ ๋ชปํ•  ๊ฒฝ์šฐ (์—†๋Š” element๋ฅผ ์ฐพ๋„๋ก ์‹œ์ผฐ์„๋•Œ)
๋ชจ๋ฅด๋Š”๋ฐ ์–ด๋–ป๊ฒŒ ์ฐพ์•„์š”.....
3. ๋งˆ์ง€๋ง‰์œผ๋กœ๋Š” ์˜ˆ์™ธ์ฒ˜๋ฆฌ๊ฐ€ ์ œ๋Œ€๋กœ ๋˜์ง€ ์•Š์•˜์„ ๊ฒฝ์šฐ์ด๋‹ค.

 

์‚ฌ์‹ค ์šฐ๋ฆฌ๋Š” API๋ฅผ ์ œ๊ณต๋ฐ›๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ ์šฐ๋ฆฌ๊ฐ€ ์ผ์ผํžˆ ๋ฐœ๊ตดํ•ด ๋‚˜๊ฐ€๋ฉด์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์ฐพ๋Š” ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์—

๋ชจ๋“  ์˜ค๋ฅ˜๋ฅผ ์ „๋ถ€ ์ƒ๊ฐํ•˜๋ฉด์„œ ์ฝ”๋”ฉํ•  ์ˆ˜ ๋Š” ์—†๋‹ค.

๊ทธ๋ž˜์„œ ์…€๋ ˆ๋‹ˆ์›€ ๊ฐ™์€ ๊ฒฝ์šฐ๋Š” try except ๊ตฌ๋ฌธ๋งŒ ์ž˜ ์จ๋„ ์˜ค๋ฅ˜๋Š” ์–ด๋А์ •๋„ ํ•ด๊ฒฐ ํ•  ์ˆ˜ ์žˆ๋‹ค.

(์ข‹์€ ๋ฐฉ๋ฒ•์€ ์•„๋‹˜....)

 

๋ฌดํŠผ ์ด๋ ‡๊ฒŒ ๋‚˜๋Š” ์…€๋ ˆ๋‹ˆ์›€์„ ํ†ตํ•ด ๋„ค์ด๋ฒ„ ์ง€๋„ ์‹๋‹น์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์™”๊ณ 

์ •์ƒ์ ์œผ๋กœ ๋™์ž‘ํ•˜๋Š” ๊ฒƒ์„ ํ™•์ธํ•˜์˜€๋‹ค.

์ €์ž‘์žํ‘œ์‹œ ๋น„์˜๋ฆฌ ๋ณ€๊ฒฝ๊ธˆ์ง€ (์ƒˆ์ฐฝ์—ด๋ฆผ)
Hac. Dog ๐ŸŒญ
Hac. Dog ๐ŸŒญ
Hello World
Hac. Dog : BlogHello World
Hac. Dog ๐ŸŒญ
Hac. Dog : Blog
Hac. Dog ๐ŸŒญ
์ „์ฒด
์˜ค๋Š˜
์–ด์ œ
  • ๋ถ„๋ฅ˜ ์ „์ฒด๋ณด๊ธฐ (21)
    • Flutter (18)
      • ์ผ๋ฐ˜ (11)
      • pakages (7)
    • Google PlayStore (1)
    • Python (1)

๊ณต์ง€์‚ฌํ•ญ

hELLO ยท Designed By ์ •์ƒ์šฐ
๊ธ€์“ฐ๊ธฐ / ๊ด€๋ฆฌ์ž
Hac. Dog ๐ŸŒญ
๋„ค์ด๋ฒ„ ์ง€๋„ ์…€๋ ˆ๋‹ˆ์›€์œผ๋กœ ํฌ๋กค๋ง ํ•ด๋ณด๊ธฐ
์ƒ๋‹จ์œผ๋กœ

ํ‹ฐ์Šคํ† ๋ฆฌํˆด๋ฐ”

๋‹จ์ถ•ํ‚ค

๋‚ด ๋ธ”๋กœ๊ทธ

๋‚ด ๋ธ”๋กœ๊ทธ - ๊ด€๋ฆฌ์ž ํ™ˆ ์ „ํ™˜
Q
Q
์ƒˆ ๊ธ€ ์“ฐ๊ธฐ
W
W

๋ธ”๋กœ๊ทธ ๊ฒŒ์‹œ๊ธ€

๊ธ€ ์ˆ˜์ • (๊ถŒํ•œ ์žˆ๋Š” ๊ฒฝ์šฐ)
E
E
๋Œ“๊ธ€ ์˜์—ญ์œผ๋กœ ์ด๋™
C
C

๋ชจ๋“  ์˜์—ญ

์ด ํŽ˜์ด์ง€์˜ URL ๋ณต์‚ฌ
S
S
๋งจ ์œ„๋กœ ์ด๋™
T
T
ํ‹ฐ์Šคํ† ๋ฆฌ ํ™ˆ ์ด๋™
H
H
๋‹จ์ถ•ํ‚ค ์•ˆ๋‚ด
Shift + /
โ‡ง + /

* ๋‹จ์ถ•ํ‚ค๋Š” ํ•œ๊ธ€/์˜๋ฌธ ๋Œ€์†Œ๋ฌธ์ž๋กœ ์ด์šฉ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ํ‹ฐ์Šคํ† ๋ฆฌ ๊ธฐ๋ณธ ๋„๋ฉ”์ธ์—์„œ๋งŒ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค.