How Programs Run

Making Thins Fast

이번에는 프로그램의 코스트에 대해 알아볼 것이다. 이전에는 프로그램을 만들고 그프로그램이 올바른 값을 내놓을때 그것으로 만족했지만, 프로그램이 커지고 많은일을 하게될수록 코스트에 대해서 신경을 쓸수밖에 없고 이는 컴퓨터터사이언스에서 매우 중요한 부분이다. 이것을 알고리즘 분석 이라고한다.

cost

프로그램사용에 소요되는 시간(프로그램의 작동시간), 전력, 메모리 등. 프로그램 효율의 지표.

Make Big Index

def make_string

def make_big_index(size):
    index = []
    letters = ['a','a','a','a','a','a','a','a']
    while len(index) < size:
        word = make_string(letters)
        add_to_index(index,word, 'fake')
        for i in range(len(letters)-1, 0, -1):
            if letters[i] < 'z':
                letters[i] = chr(ord(letters[i]) + 1)
                break
            else:
                letters[i] = 'a'
    return index

add to index함수가 기억이 나지 않는다 ㅠㅠㅠ

문제점 발견 : 이부분의 내용은 loop가 key를 찾을때 모든 루프를 돌기때문에 aaaaaaaa와 같은 앞쪽의 인덱스는 빠르게 찾지만, aaaazzzz같은 경우는 훨씬 많은 시간이 든다는 것이다. 자체정렬데이터알고리즘을 사용하면 좋지 않을까?

Making Lookup Faster

어떻게 빠르게 만들수 있을까? 를 생각하기전에 왜 느린지를 생각 해보자.

for entry in index:
    if .... 
#모든 인덱스를 뒤져서 검색하고,
#만약 그 검색하고자 하는 내용이 없다면, 모든 내용을 끝까지 뒤져야 끝난다.

사람은 이런식으로 찾지 않는다. 그러므로 우리는 사전처럼 정렬할 필요가 있다. 모든 인덱스를 뒤지지 않고 udacity를 검색할 경우 'u' 인덱스안에서 검색한다. 전체를 다 뒤지는것 보다 훨~씬 빠르다.

Hash Function

<String> -> hash_string -> Number(0 ~ b-1)

b = 테이블의 길이

ord(one-letter string) -> number ---> 아스키

ord('a') # 97

chr(number) -> one-letter string

chr(ord("a")) -> "a"

<해싱을 공부하고 다시 돌아온다.>

Modulus Operators(모듈로 연산)

링크

ord('a') % rod('a') # 0
(ord('a') + 2) % rod('a') # 2
13 % 10 # 3

Quiz: Equivalent Expressions

# Which of these expressions are always equivalent to x, where x is any interger between 0 and 10:
# 어떤 표현식의 결과값이 항상 x의 값과 같은가? x는 0 과 10 사이의 수이다.

x % 7 # x
x % 23 # o
ord(chr(x)) # o
chr(ord(x)) # TypeError: ord() expected string of length 1, but int found -> string만 가능하다.

좋은 해시함수의 결과는 값들이 균등하게(?) 분포해야한다.

Quiz: Empty Hash Table

# ver1
def make_hashtable(nbuckets):
    i = 0
    table = []
    while i < nbuckets:
        table.append([])
        i = i + 1
    return table

# ver2
def make_hashtable(nbuckets):
    table = []
    for unused in range(0, ubuckets):
        table.append([])
    return table

# do not use!
# Because each element in the output refers to the same empty list
def make_hashtable_NOT(nbuckets):
    return [[]]*nbuckets

table = make_hashtable_NOT(3) #[[], [], []]
table[1].append(['a',[0]])
# [[['a', [0]]], [['a', [0]]], [['a', [0]]]]

range(<start>, <stop>) #[<start>, <start>+1,...,<stop>-1]
range(0,10) #[0,1,2,...9]

Quiz: Finding Buckets

# 기본코드
# 집어 넣을 포지션을 뜻하는것 같다.
def hash_string(keyword,buckets):
    out = 0
    for s in keyword:
        out = (out + ord(s)) % buckets
    return out

# 미리 길이를 정해놓은 해시테이블을 만든다.
def make_hashtable(nbuckets):
    table = []
    for unused in range(0,nbuckets):
        table.append([])
    return table

# Define a procedure, hashtable_get_bucket,
# that takes two inputs - a hashtable, and
# a keyword, and returns the bucket where the
# keyword could occur.

def hashtable_get_bucket(htable,keyword):
    return htable[hash_string(keyword,len(htable))]

table = [[['Francis', 13], ['Ellis', 11]], [], [['Bill', 17], ['Zoe', 14]], [['Coach', 4]], [['Louis', 29], ['Rochelle', 4], ['Nick', 2]]]

print hashtable_get_bucket(table, "Zoe")
#>>> [['Bill', 17], ['Zoe', 14]]

#print hashtable_get_bucket(table, "Brick")
#>>> []

#print hashtable_get_bucket(table, "Lilith")
#>>> [['Louis', 29], ['Rochelle', 4], ['Nick', 2]]

도대체 이 코드가 뭘 위한 코드 인지 모르겠다. 도대체 왜 저렇게 찾는거지 그리고 이 table은 무슨 원리, 규칙으로 만들어진거지?

Quiz: Adding Keywords

def hashtable_add(htable,key,value):
    # your code here
    #hashtable_get_bucket(htable,key).append([key,value])
    htable[hash_string(key,len(htable))].append([key,value])
    return htable  

table = make_hashtable(5)
hashtable_add(table,'Bill', 17)
hashtable_add(table,'Bill', 18)
print table # [[], [], [['Bill', 17],['Bill', 18]], [], []]

위 예제에서 조금 힌트를 얻었다. 하지만 여전히 무엇을 위해 ~~이지랄~~을 하는지 모르겠다.ㅠ

이 함수의 문제점은 똑같은 키를 집어 넣을 경우 중복되는 키가 있어도 계속 append된다는 것이다. update 되는게 맞는것 같다.(value만 바껴야 한다.)

Quiz: Lookup

def hashtable_lookup(htable,key):
    for item in hashtable_get_bucket(htable,key):
        if item[0] == key:
            return item[1]
    return None #udacity에서는 이렇게 넣어줬다.

table = [[['Ellis', 11], ['Francis', 13]], [], [['Bill', 17], ['Zoe', 14]],[['Coach', 4]], [['Louis', 29], ['Nick', 2], ['Rochelle', 4]]]
print hashtable_lookup(table, 'Francis')
#>>> 13

찾고자 하는 key가 해쉬테이블 내에 있다면 value를 리턴한다. (없으면? 에 대한 얘기는 없어서 안만들었다.) 여기까지 본 바로는 해쉬테이블이란건 한정된 길이의 테이블안에서 최대한 효율적으로 값을 나눠 넣는것이다. 그 기준이 되는게 hash_string함수인거고 그 규칙에 맞게 넣었기 때문에 같은규칙으로 찾을 수도 있는것. 이제 update도 만들 수 있을것 같다.

Quiz: Update

# 민우코드
# hashtable_get_bucket(htable,key)랑 htable[hash_string(key,len(htable))]이게 같은건데...
def hashtable_update(htable,key,value):
    if hashtable_lookup(htable,key):
        for item in hashtable_get_bucket(htable,key):
            if item[0] == key:
                item[1] = value
    else:
        htable[hash_string(key,len(htable))].append([key,value])
    return htable  

# udacity
def hashtable_update(htable,key,value):
    bucket = hashtable_get_bucket(htable,key)
    for item in bucket:
        if item[0] == key:
            item[1] = value
            return htable
    bucket.append([key,value])
    return htable
#그냥 이렇게 끝낸다고 ?.. 

#수정한 민우코드
def hashtable_update(htable,key,value):
    bucket = hashtable_get_bucket(htable,key)
    if hashtable_lookup(htable,key): 
        for item in bucket:
            if item[0] == key:
                item[1] = value
    else:
        bucket.append([key,value])
    return htable 


table = make_hashtable(3)
hashtable_update(table,"udacity",1)
hashtable_update(table,"udacity",2)
hashtable_update(table,"udacity",3)
print table #[[], [], [['udacity', 3]]]

#수정한 내코드가 논리적으로는 더 갠찮지만 루프가 두번 돌기 때문에 자원소모가 두배 크다.
#여기서 루프가 두번 도는걸 해결 하려면lookup을 바꺼야 한다.

Dictionary

mutable, d[key] -> value

elements = {'time': 4, "hello": 5, "minwoo":5}
print "minwoo" in elements #True
print "anotherKey" in elements #False

elements['anotherKey'] = len('anotherKey')
print "anotherKey" in elements #True

gas = {}
gas['He'] = {'name' : 'Helium', 'number':2, 'weight':4.002602, 'noble gas':True}
print gas
#{ 'He': {'name' : 'Helium', 'number':2, 'weight':4.002602, 'noble gas':True} }

Quiz: 구닥다리 코드 바꾸기

#add_to_index를 고쳐보자
#이전코드
def add_to_index(index, keyword, url): 
    for entry in index: 
        if entry[0] == keyword: entry[1].append(url) 
            return 
    # not found, add new keyword to index 
    index.append([keyword, [url]]) 

#이젠 루프를 돌릴 필요가 업사다.
#민우가 고친코드 == udacity
def add_to_index(index, keyword, url): 
    if keyword in index: 
        index[keyword].append(url) 
        return 
    else:
        index[keyword] = [url]

#다음은 lookup
def lookup(index, keyword): 
    for entry in index: 
        if entry[0] == keyword: 
            return entry[1] 
    return None

#간단하게 요래
def lookup(index, keyword): 
    if keyword in index:
        return index[keyword]
    return None

와드디어끝!

¹. approximately : 대략 ↩

lesson18