Responding to Queries

자료구조에서 class에 많은 기능을 prototype을 만들던 느낌으로 하면 좀더 이헤하기 좋다.

Quiz:Data structure

#검색엔진을 만들기 위해 하나의 키워드에 해당하는 여러개 url을 배열에 넣는다.

index = [
    [keyword, [url1,url2....urlN],
    [keyword2, [url1,url2....urlN]
]

Quiz:Add to Index

index = []
def add_to_index(index,keyword,url):
    for arr in index:
        if arr[0] == keyword:
            if url not in arr[1]:
                arr[1].append(url)
                return
    index.append([keyword,[url]])

add_to_index(index,'udacity','http://udacity.com')
add_to_index(index,'computing','http://acm.org')
add_to_index(index,'udacity','http://npr.org')
print index
#>>> [['udacity', ['http://udacity.com', 'http://npr.org']], 
#>>> ['computing', ['http://acm.org']]]

Quiz:Lookup

index배열에서 keyword를 통해 url을 검색하기위한 lookup()함수

index = [['udacity', ['http://udacity.com', 'http://npr.org']],
         ['computing', ['http://acm.org']]]

#민우코드
def lookup(index, keyword):
    for arr in index:
        if keyword in arr:
            return arr[1]

#udacity
def lookup(index, keyword):
    for entry in index: #변수명에 좀더 신경을 써야겠다.
        if keyword in entry:
            return entry[1]
    return [] #for loop를 도는동안 아무것도 찾지 못했다면 빈배열을 반환한다.

print lookup(index,'udacity')
#>>> ['http://udacity.com','http://npr.org']

Build the Web Index

<string>.split() -> [<word>, <word>, ...]

quote = "My name is minwoo"
print quote.split() # -> ['My','name','is','minwoo']

quote2 = """
    My name is minwoo. I ove Javascript.
    But i'm studying python.
    """
print quote2.split()
# -> ['My', 'name', 'is', 'minwoo.', 'I', 'ove', 
#     'Javascript.', 'But', "i'm", 'studying', 'python.']

Quiz:Add Page to Index

#관련검색어?
# add_to_index 함수를 만들어놨기때문에 어려울게 없다.
index = []

def add_to_index(index,keyword,url):
    for entry in index:
        if entry[0] == keyword:
            entry[1].append(url)
            return
    index.append([keyword,[url]])

def add_page_to_index(index,url,content):
    words = content.split()#content를 띄어쓰기를 기준으로 나눠 words list에 담는다.
    for word in words:
        add_to_index(index,word,url)#word를 키워드로 add_to_index 함수를 실행한다.
    return

add_page_to_index(index,'fake.text',"This is a test")
print index
#>>> [['This', ['fake.text']], ['is', ['fake.text']], ['a', ['fake.text']],
#>>> ['test',['fake.text']]]

Quiz: Finishing the Web Crawler

def crawl_web():
    tocrawl = [seed]
    crawled = []
    index = []
    while tocrawl:
        page = tocrawl.pop()
        if page not in crawled:
            content = get_page(page)#페이지의 소스를 불러온다.
            add_page_to_index(index,page,content)
            union(tocrawl, get_all_links(content))
            crawled.append(page)
    return index

#crawl_seed(seed)이런식으로 함수를 불러서
#seed를 넣어줘야하지 않을까?

get_page()함수만 적절히 만들면 원하는 기능들이 완벽하게 실행되는 웹크롤러이다.

Startup

유닛5에서는 우리가만든 searchengine을 더욱 빠르게 만들것이다.
유닛6에서는 best page를 골라내게 할것이다. 지금은 모든 페이지를 보여준다.
그전에 인터넷과 웹이 작동하는것을 좀더 배워볼것이다.

Internet

url -> get_page -> content of that page

def get_page(url):
    try:
        import urllib #파이썬 라이브러리
        return urllib.urlopen(url).read()
    except:
        return ""

Network

A network is a group of entities¹ that can communicate, even though they are not all directly connected.

Latency : time it takes messages to get from source to destination(s, ms, ns 등 시간의 단위)
Bandwidth(대역폭) : amount of information that can be transmitted per unit time(bits per second , 일정 시간당 보낼수있는 데이터자료의 단위, 1Mbps == 초당 1메가비트)
bit : smallest unit of information (0/1 two choice, yes or no, light bulb), 한 가지 정보를 전달할수 있다.(적이 온다안온다. 불이 켜졌다 안켜졌다.)

Quiz: Bits

# 빨,주,노,초 4개의 박스중 하나의 박스에 공이 들어있다.
# 어떤 박스에 들어있는지 알려면 최소 몇 bit가 필요할까?

1. 빨,주 중에 있습니까? => yes or no
2-0. 1에서 yes라면 빨입니까? => yes or no
2-1. 1에서 no라면 노입니까? => yes or no

답 : 2 bits (2**2=4)

보기가 4개일 경우 2의 2승이 4이기 때문에, 2bit가 필요하다. 만약 16개일경우 2의 4승 이므로 4bits가 필요하다.(이두희

Protocols

# HTTP(Hypertext Transfer Protocol)

client (webbrowser) -- GET <object> --> server(udacity.com)
       <---response-contents-of <object>---

Problem Set

Quiz:Better Splitting

#민우코드
def split_string(source,splitlist):
    str = source
    for char in splitlist:
        splited = str.split(char)
        str = " ".join(splited)

    return str.split()

#udacity
def split_string(source, splitlist):
    output = []
    atsplit = True
    for char in source:
        if char in splitlist:
            atsplit = True
        else:
            if atsplit:
                output.append(char)
                atsplit = False
            else:
                output[-1] = output[-1] + char
    return output

#유다시티에서는 완전히 반대로 접근했다.
#내경우에는 splitlist에 " "가없을경우 대응하지 못한다.

Quiz: improving index

index를 찾아 url을 append할때 만약 url목록중 해당 url과 중복되는 url이 있다면, 추가하지 않는다.

def add_to_index(index, keyword, url):
    for entry in index:
        if entry[0] == keyword:
            if url not in entry[1]:
                entry[1].append(url)
            return
    # not found, add new keyword to index
    index.append([keyword, [url]])

Quiz:Counting Clicks

index: [[keyword, [[url, count], [url, count],..]],...]

#문제를 잘 이해 못했다.
def record_user_click(index, keyword, url):
    urls = lookup(index, keyword) #이함수는 url목록을 반환한다. index에서 entry[1]에 해당하는 부분이다.
    if urls:
        for entry in urls: #배열안에 있는 [url,count]을 entry로 루프를 돌렸다.
            if entry[0] == url: #유저가 클릭한 url이 있을경우 
                entry[1] = entry[1]+1 #count가 1증가했다.

def add_to_index(index, keyword, url):
    # format of index: [[keyword, [[url, count], [url, count],..]],...]
    for entry in index:
        if entry[0] == keyword:
            for urls in entry[1]: #entry[1]에는 배열들이 들어간다.
                if urls[0] == url: #겹치는게 발견되면 넣을 필요 없기때문에 루프를 중단한다.
                    return
            entry[1].append([url,0])# 자료구조를 변경했다.
            return
    # not found, add new keyword to index
    index.append([keyword, [[url,0]]])

Quiz: Time Spent at Routers

Traceroute, round-trip time, from Birmingham, England to Sundsvall, Sweden takes 75 ms. The one-way distanse from Birmingham to Sundsvall is 2500km. The speed of Light in aptical fibre is 200000km/s(speed of data between thre routers). What is the total ime spent at the routes?

round_trip_time = 75 #ms
one_way_dist = 2500 #km
speed_of_light = 200 #km/ms
round_time_in_fibre = 2*one_way_dist/speed_of_light #25ms
#round_trip_time - round_time_in_fibre = time_in_routes #50ms

Word Count

def count_words(p):
    result = p.split()
    return len(result)

passage =("The number of orderings of the 52 cards in a deck of cards "
"is so great that if every one of the almost 7 billion people alive "
"today dealt one ordering of the cards per second, it would take "
"2.5 * 10**40 times the age of the universe to order the cards in every "
"possible way.")
print count_words(passage)
#>>>56

Converting Seconds

def result(h, m ,s):
    hstr = str(int(h)) + " hour"
    mstr = ", "+ str(int(m)) + " minute"
    sstr = ", " + str(s) + " second"
    if h > 1 or h == 0:
        hstr = hstr + "s"
    if m > 1 or m == 0:
        mstr = mstr + "s"
    if s > 1 or s == 0:
        sstr = sstr + "s"

    return hstr + mstr + sstr

def convert_seconds(time):
    if time > 59:
        min = time/60
        second = time%60
        if min > 59:
            hour = min/60
            min = min%60
        else:
            hour = 0
    else:
        hour = 0
        min = 0
        second = time


    return result(int(hour),int(min),second)

print convert_seconds(3661)
#>>> 1 hour, 1 minute, 1 second

print convert_seconds(7325)
#>>> 2 hours, 2 minutes, 5 seconds

print convert_seconds(7261.7)
#>>> 2 hours, 1 minute, 1.7 seconds

print convert_seconds(3661)
#>>> 1 hour, 1 minute, 1 second

¹. entities : 존재, 개체(사람, 기관, 컴퓨터, 조직 등) ↩

lesson 15~16