# codejam2015-round2-pbC 的三种解法

Elliot's parents speak French and English to him at home. He has heard a lot of words, but it isn't always clear to him which word comes from which language! Elliot knows one sentence that he's sure is English and one sentence that he's sure is French, and some other sentences that could be either English or French. If a word appears in an English sentence, it must be a word in English. If a word appears in a French sentence, it must be a word in French.
Considering all the sentences that Elliot has heard, what is the minimum possible number of words that he's heard that must be words in both English and French?

• Input
The first line of the input gives the number of test cases, T. T test cases follow. Each starts with a single line containing an integer N. N lines follow, each of which contains a series of space-separated "words". Each "word" is made up only of lowercase characters a-z. The first of those N lines is a "sentence" in English, and the second is a "sentence" in French. The rest could be "sentences" in either English or French. (Note that the "words" and "sentences" are not guaranteed to be valid in any real language.)

• Output
For each test case, output one line containing "Case #x: y", where x is the test case number (starting from 1) and y is the minimum number of words that Elliot has heard that must be words in both English and French.

• Limits
1 ≤ T ≤ 25.
Each word will contain no more than 10 characters.
The two "known" sentences will contain no more than 1000 words each.
The "unknown" sentences will contain no more than 10 words each.
Small dataset
2 ≤ N ≤ 20.
Large dataset
2 ≤ N ≤ 200.

```def readval(typ=int):
return typ( raw_input() )

return map( typ, raw_input().split() )

def testcase(cas):
print 'Case #%d: %d' % ( cas, res )

if __name__=='__main__':
T = int(raw_input())
for i in xrange(T):
testcase(i+1)
```

# 法1: 穷举 bruteforce (for small testcase)

## naive bruteforce

```>>> from itertools import combinations
>>> for c in combinations('ABCD', 2): print c
('A', 'B')
('A', 'C')
('A', 'D')
('B', 'C')
('B', 'D')
('C', 'D')
```

`combination`函数即可实现枚举每句话是英语还是法语的功能.

```def testcase(cas): # not fast enough...
if N==2:
print 'Case #%d: %d' % ( cas, len(En.intersection(Fr)) )
return
scent = []
for i in xrange(N-2):
def partition(engsubset):
allEn, allFr = En.copy(), Fr.copy()
for i in xrange(N-2):
if i in engsubset: allEn.update(scent[i])
else: allFr.update(scent[i])
return len( allEn.intersection(allFr) )
possibleres = []
for l in xrange(N-2):
for engsubset in combinations(xrange(N-2), l):
possibleres.append(partition(engsubset))
res = min(possibleres)
print 'Case #%d: %d' % ( cas, res )
```

## bruteforce using bitmap

• 枚举各个句子的语言种类: 如果每个句子用一位来表示的话(1代表英语, 0代表法语), 那么用N位的bitmap即可表示一种情形. 这个bitmap只需从0增加到2^N-1, 就把2^N个可能性都遍历了. (实际上是2^N-2个可能性, 因为前两个句子已经确定语言了).
• 两种语言的词汇表进行union/intersection: 假设共有K个不同的单词, 那么用一个K位的bitmap即可表示一种语言包含了哪些单词. 然后集合的union和intersection即可表示为OR和AND的逻辑运算.

bit manipulation蛮subtle的, 不过习惯了就好... 另外python的integer可以任意长度, 不用像C/java那样考虑bitmap位数大于64的情况, 还是非常方便的. 代码如下:

```def testcase(cas): # use bitmap instead of set to speedup for small case !!
scent = []; allwords = set()
for i in xrange(N):
allwords.update( scent[i].split() )
words = sorted(allwords)
# if K distinct words in total, each sentence can be reprensented as a K-bit bitmap
bitmaps = []
for i in  xrange(N): #construct bitmaps
bm = 0
for wd in scent[i].split():
bm |= ( 1<< words.index(wd) )
bitmaps.append(bm)
res = 1e10
# look for all combinations
for i in xrange(1<<(N-2)):
en = bitmaps[0]; fr = bitmaps[1]
for k in xrange(N-2):
if (1<<k) & i > 0: en |= bitmaps[k+2]
else: fr |= bitmaps[k+2]
res = min( res, bin(en&fr).count('1') )
print 'Case #%d: %d' % ( cas, res )
```

# 法2: 最大流 maxflow

• `w1-->w2`, capacity=1
• `S-->w1`, capacity=INF
• `w2-->S`, capacity=INF

```def testcase_maxflow(ind): # using maxflow !
INF = 1e10
import networkx as nx
G = nx.DiGraph()
for i in xrange(N):
si = 'sent-%d'%i
for wd in words:
flow_value, flow_dict = nx.maximum_flow(G, 'sent-0', 'sent-1')
print 'Case #%d: %d' % ( ind, int(flow_value) )
```

# 法3: 线性规划 (integer) linear programming

```>>> from pulp import *
>>> x = LpVariable("x", 0, 3)
>>> y = LpVariable("y", 0, 1, 'Integer') # var category can be integer
>>> prob = LpProblem("myProblem", LpMinimize)
>>> prob += x + y <= 2 # add constraint
>>> prob += -4*x + y # add objective
>>> status = prob.solve() # solve using default solver
>>> status = prob.solve(GLPK(msg = 0)) # or use glpk solver
>>> LpStatus[status]
'Optimal'
>>> value(prob.objective) # see objective value
-8.0
>>> value(x) # see variable value
2.0
```

OK, 本题的建模过程如下:

(这篇文章总结了各种逻辑关系用线性规划的描述方式, 写的得非常详细. )

```Minimize sum(wef)
st:
we >= se
wf >= (1-se)
wef >= we+wf-1
Se[0]==1, Se[1]==0
```

```def testcase(ind):# formulate it as linear programming
sentc = []
allwords = set()
for i in xrange(N):
allwords.update( sentc[-1] )
words = sorted( allwords )
M = len(words)
wordsindex = {words[i]:i for i in xrange(M)} # mapping a word to its index
pb = LpProblem('Bilingual', LpMinimize)
# LP variables
Se = [ LpVariable('Se_'+str(i), cat='Binary') for i in xrange(N) ] # Se[i] = indicator(scentence i is english)
we = [ LpVariable('we_'+str(j), cat='Binary') for j in xrange(M) ] # we[j] = indicator(word j is english)
wf = [ LpVariable('wf_'+str(j), cat='Binary') for j in xrange(M) ] # wf[j] = indicator(word j is french)
wef = [ LpVariable('wef_'+str(j), cat='Binary') for j in xrange(M) ] # wef[j] = indicator(word j is BOTH en and fr)
pb += sum( wef )
pb += Se[0]==1
pb += Se[1]==0
for i in xrange(N):
si = sentc[i]
for wd in si:
j = wordsindex[wd]
pb += we[j] >= Se[i]
pb += wf[j] >= (1-Se[i])
for j in xrange(M):
pb += wef[j] >= we[j]+wf[j]-1 # # wef[i] = we[i] && wf[i]
#~ pb.solve( GLPK(msg=0) )
pb.solve(  )
res = int( value(pb.objective) )
print 'Case #%d: %d' % ( ind, res )
```

```Se = [ LpVariable('Se_'+str(i), 0, 1) for i in xrange(N) ] # Se[i] = indicator(scentence i is english)
we = [ LpVariable('we_'+str(j), 0) for j in xrange(M) ] # we[j] = indicator(word j is english)
wf = [ LpVariable('wf_'+str(j), 0) for j in xrange(M) ] # wf[j] = indicator(word j is french)
wef = [ LpVariable('wef_'+str(j), 0) for j in xrange(M) ] # wef[j] = indicator(word j is BOTH en and fr)
```