How is the most efficient way to subtract two lists?
I have two lists in python list_a and list_b. The list_a have some images links, and the list_b too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a, that means all items in list_b are in list_a. My initial idea is subtract all items:
list_a - list_b = list_c, where the list_c are my surplus items. My code is:
list_a =
list_b =
list_c =
arq_b = open('list_b.txt','r')
for b in arq_b:
list_b.append(b)
arq_a = open('list_a.txt','r')
for a in arq_a:
if a not in arq_b:
list_c.append(a)
arq_c = open('list_c.txt','w')
for c in list_c:
arq_c.write(c)
I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022 items in my list_b.txt and 78.616.777 in my list list_a.txt. I dont't know the cost of this expression: if a not in arq_b. But if i execute this code, i think wont finish in this year.
My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.
My question is, there's another way to make this operation more efficiently(Faster)?
- The
list_ais ordinate but thelist_bnot. - Each item have this size:
images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png
- The order doesnt matter, i want know the surplus.
python python-3.x list performance
|
show 2 more comments
I have two lists in python list_a and list_b. The list_a have some images links, and the list_b too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a, that means all items in list_b are in list_a. My initial idea is subtract all items:
list_a - list_b = list_c, where the list_c are my surplus items. My code is:
list_a =
list_b =
list_c =
arq_b = open('list_b.txt','r')
for b in arq_b:
list_b.append(b)
arq_a = open('list_a.txt','r')
for a in arq_a:
if a not in arq_b:
list_c.append(a)
arq_c = open('list_c.txt','w')
for c in list_c:
arq_c.write(c)
I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022 items in my list_b.txt and 78.616.777 in my list list_a.txt. I dont't know the cost of this expression: if a not in arq_b. But if i execute this code, i think wont finish in this year.
My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.
My question is, there's another way to make this operation more efficiently(Faster)?
- The
list_ais ordinate but thelist_bnot. - Each item have this size:
images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png
- The order doesnt matter, i want know the surplus.
python python-3.x list performance
4
Does the order matter? If not, try using sets. With sets, subtraction should be linear:set_c = set_a - set_b.
– L3viathan
58 mins ago
But is possible make this in python?
– Vinicius Morais
56 mins ago
1
Yes, I mean the Python datatypeset.
– L3viathan
55 mins ago
3
Possible duplicate of Subtracting 2 lists in Python
– tripleee
53 mins ago
1
Then possible duplicate of stackoverflow.com/questions/14807612/…
– tripleee
43 mins ago
|
show 2 more comments
I have two lists in python list_a and list_b. The list_a have some images links, and the list_b too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a, that means all items in list_b are in list_a. My initial idea is subtract all items:
list_a - list_b = list_c, where the list_c are my surplus items. My code is:
list_a =
list_b =
list_c =
arq_b = open('list_b.txt','r')
for b in arq_b:
list_b.append(b)
arq_a = open('list_a.txt','r')
for a in arq_a:
if a not in arq_b:
list_c.append(a)
arq_c = open('list_c.txt','w')
for c in list_c:
arq_c.write(c)
I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022 items in my list_b.txt and 78.616.777 in my list list_a.txt. I dont't know the cost of this expression: if a not in arq_b. But if i execute this code, i think wont finish in this year.
My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.
My question is, there's another way to make this operation more efficiently(Faster)?
- The
list_ais ordinate but thelist_bnot. - Each item have this size:
images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png
- The order doesnt matter, i want know the surplus.
python python-3.x list performance
I have two lists in python list_a and list_b. The list_a have some images links, and the list_b too. 99% of the items are the same, but i have to know this 1%. The all surplus items are in list_a, that means all items in list_b are in list_a. My initial idea is subtract all items:
list_a - list_b = list_c, where the list_c are my surplus items. My code is:
list_a =
list_b =
list_c =
arq_b = open('list_b.txt','r')
for b in arq_b:
list_b.append(b)
arq_a = open('list_a.txt','r')
for a in arq_a:
if a not in arq_b:
list_c.append(a)
arq_c = open('list_c.txt','w')
for c in list_c:
arq_c.write(c)
I think the logic is right, if i have some items, the code is run fast. But i dont have 10 items, or 1.000, or even 100.000. I have 78.514.022 items in my list_b.txt and 78.616.777 in my list list_a.txt. I dont't know the cost of this expression: if a not in arq_b. But if i execute this code, i think wont finish in this year.
My pc have 8GB, and i allocate 15gb for swap to not explode my RAM.
My question is, there's another way to make this operation more efficiently(Faster)?
- The
list_ais ordinate but thelist_bnot. - Each item have this size:
images/00000cd9fc6ae2fe9ec4bbdb2bf27318f2babc00.png
- The order doesnt matter, i want know the surplus.
python python-3.x list performance
python python-3.x list performance
edited 54 mins ago
Vadim Kotov
4,34853247
4,34853247
asked 59 mins ago
Vinicius MoraisVinicius Morais
1407
1407
4
Does the order matter? If not, try using sets. With sets, subtraction should be linear:set_c = set_a - set_b.
– L3viathan
58 mins ago
But is possible make this in python?
– Vinicius Morais
56 mins ago
1
Yes, I mean the Python datatypeset.
– L3viathan
55 mins ago
3
Possible duplicate of Subtracting 2 lists in Python
– tripleee
53 mins ago
1
Then possible duplicate of stackoverflow.com/questions/14807612/…
– tripleee
43 mins ago
|
show 2 more comments
4
Does the order matter? If not, try using sets. With sets, subtraction should be linear:set_c = set_a - set_b.
– L3viathan
58 mins ago
But is possible make this in python?
– Vinicius Morais
56 mins ago
1
Yes, I mean the Python datatypeset.
– L3viathan
55 mins ago
3
Possible duplicate of Subtracting 2 lists in Python
– tripleee
53 mins ago
1
Then possible duplicate of stackoverflow.com/questions/14807612/…
– tripleee
43 mins ago
4
4
Does the order matter? If not, try using sets. With sets, subtraction should be linear:
set_c = set_a - set_b.– L3viathan
58 mins ago
Does the order matter? If not, try using sets. With sets, subtraction should be linear:
set_c = set_a - set_b.– L3viathan
58 mins ago
But is possible make this in python?
– Vinicius Morais
56 mins ago
But is possible make this in python?
– Vinicius Morais
56 mins ago
1
1
Yes, I mean the Python datatype
set.– L3viathan
55 mins ago
Yes, I mean the Python datatype
set.– L3viathan
55 mins ago
3
3
Possible duplicate of Subtracting 2 lists in Python
– tripleee
53 mins ago
Possible duplicate of Subtracting 2 lists in Python
– tripleee
53 mins ago
1
1
Then possible duplicate of stackoverflow.com/questions/14807612/…
– tripleee
43 mins ago
Then possible duplicate of stackoverflow.com/questions/14807612/…
– tripleee
43 mins ago
|
show 2 more comments
4 Answers
4
active
oldest
votes
you can create one set of the first file contents, then just use difference or symmetric_difference depending on what you call a difference
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
diffs = set_a.difference(f)
if list_b.txt contains more items than list_a.txt you want to swap them or use set_a.symmetric_difference(f) instead, depending on what you need.
difference(f) works but still has to construct a new set internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.
Nice, this avoids having to allocate space for the second set.
– L3viathan
42 mins ago
Well, not really, because internally asetis created, then thrown away. but it's thrown away faster
– Jean-François Fabre
41 mins ago
But the complexity is the same of subtract sets?
– Vinicius Morais
36 mins ago
add a comment |
Try using sets:
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
set_b = set(f)
set_c = set_a - set_b
with open("list_c.txt","w") as f:
for c in set_c:
f.write(c)
The complexity of subtracting two sets is O(n) in the size of the set a.
1
You know - an open file is an iterator - therefore you can simply doset_a = set(open("list_a.txt"))
– jsbueno
48 mins ago
4
yes but doingset(f)in with block ensures that it closes the file
– Jean-François Fabre
45 mins ago
add a comment |
To extend the comment of @L3viathan
If order of element is not important set is the rigth way.
here a dummy example you can adapt:
l1 = [0,1,2,3,4,5]
l2 = [3,4,5]
setL1 = set(l1) # transform the list into a set
setL2 = set(l2)
setDiff = setl1 - setl2 # make the difference
listeDiff = list(setDiff) # if you want to have your element back in a list
as you see is pretty straightforward in python.
add a comment |
In case order matters you can presort the lists together with item indices and then iterate over them together:
list_2 = sorted(list_2)
diff_idx =
j = 0
for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
if x != list_2[j]:
diff_idx.append(i)
else:
j += 1
diff = [list_1[i] for i in sorted(diff_idx)]
This has time complexity of the sorting algorithm, i.e. O(n*log n).
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54128876%2fhow-is-the-most-efficient-way-to-subtract-two-lists%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
you can create one set of the first file contents, then just use difference or symmetric_difference depending on what you call a difference
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
diffs = set_a.difference(f)
if list_b.txt contains more items than list_a.txt you want to swap them or use set_a.symmetric_difference(f) instead, depending on what you need.
difference(f) works but still has to construct a new set internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.
Nice, this avoids having to allocate space for the second set.
– L3viathan
42 mins ago
Well, not really, because internally asetis created, then thrown away. but it's thrown away faster
– Jean-François Fabre
41 mins ago
But the complexity is the same of subtract sets?
– Vinicius Morais
36 mins ago
add a comment |
you can create one set of the first file contents, then just use difference or symmetric_difference depending on what you call a difference
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
diffs = set_a.difference(f)
if list_b.txt contains more items than list_a.txt you want to swap them or use set_a.symmetric_difference(f) instead, depending on what you need.
difference(f) works but still has to construct a new set internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.
Nice, this avoids having to allocate space for the second set.
– L3viathan
42 mins ago
Well, not really, because internally asetis created, then thrown away. but it's thrown away faster
– Jean-François Fabre
41 mins ago
But the complexity is the same of subtract sets?
– Vinicius Morais
36 mins ago
add a comment |
you can create one set of the first file contents, then just use difference or symmetric_difference depending on what you call a difference
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
diffs = set_a.difference(f)
if list_b.txt contains more items than list_a.txt you want to swap them or use set_a.symmetric_difference(f) instead, depending on what you need.
difference(f) works but still has to construct a new set internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.
you can create one set of the first file contents, then just use difference or symmetric_difference depending on what you call a difference
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
diffs = set_a.difference(f)
if list_b.txt contains more items than list_a.txt you want to swap them or use set_a.symmetric_difference(f) instead, depending on what you need.
difference(f) works but still has to construct a new set internally. Not a great performance gain (see set issubset performance difference depending on the argument type), but it's shorter.
edited 42 mins ago
answered 43 mins ago
Jean-François FabreJean-François Fabre
101k954109
101k954109
Nice, this avoids having to allocate space for the second set.
– L3viathan
42 mins ago
Well, not really, because internally asetis created, then thrown away. but it's thrown away faster
– Jean-François Fabre
41 mins ago
But the complexity is the same of subtract sets?
– Vinicius Morais
36 mins ago
add a comment |
Nice, this avoids having to allocate space for the second set.
– L3viathan
42 mins ago
Well, not really, because internally asetis created, then thrown away. but it's thrown away faster
– Jean-François Fabre
41 mins ago
But the complexity is the same of subtract sets?
– Vinicius Morais
36 mins ago
Nice, this avoids having to allocate space for the second set.
– L3viathan
42 mins ago
Nice, this avoids having to allocate space for the second set.
– L3viathan
42 mins ago
Well, not really, because internally a
set is created, then thrown away. but it's thrown away faster– Jean-François Fabre
41 mins ago
Well, not really, because internally a
set is created, then thrown away. but it's thrown away faster– Jean-François Fabre
41 mins ago
But the complexity is the same of subtract sets?
– Vinicius Morais
36 mins ago
But the complexity is the same of subtract sets?
– Vinicius Morais
36 mins ago
add a comment |
Try using sets:
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
set_b = set(f)
set_c = set_a - set_b
with open("list_c.txt","w") as f:
for c in set_c:
f.write(c)
The complexity of subtracting two sets is O(n) in the size of the set a.
1
You know - an open file is an iterator - therefore you can simply doset_a = set(open("list_a.txt"))
– jsbueno
48 mins ago
4
yes but doingset(f)in with block ensures that it closes the file
– Jean-François Fabre
45 mins ago
add a comment |
Try using sets:
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
set_b = set(f)
set_c = set_a - set_b
with open("list_c.txt","w") as f:
for c in set_c:
f.write(c)
The complexity of subtracting two sets is O(n) in the size of the set a.
1
You know - an open file is an iterator - therefore you can simply doset_a = set(open("list_a.txt"))
– jsbueno
48 mins ago
4
yes but doingset(f)in with block ensures that it closes the file
– Jean-François Fabre
45 mins ago
add a comment |
Try using sets:
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
set_b = set(f)
set_c = set_a - set_b
with open("list_c.txt","w") as f:
for c in set_c:
f.write(c)
The complexity of subtracting two sets is O(n) in the size of the set a.
Try using sets:
with open("list_a.txt") as f:
set_a = set(f)
with open("list_b.txt") as f:
set_b = set(f)
set_c = set_a - set_b
with open("list_c.txt","w") as f:
for c in set_c:
f.write(c)
The complexity of subtracting two sets is O(n) in the size of the set a.
edited 44 mins ago
answered 53 mins ago
L3viathanL3viathan
15.6k12847
15.6k12847
1
You know - an open file is an iterator - therefore you can simply doset_a = set(open("list_a.txt"))
– jsbueno
48 mins ago
4
yes but doingset(f)in with block ensures that it closes the file
– Jean-François Fabre
45 mins ago
add a comment |
1
You know - an open file is an iterator - therefore you can simply doset_a = set(open("list_a.txt"))
– jsbueno
48 mins ago
4
yes but doingset(f)in with block ensures that it closes the file
– Jean-François Fabre
45 mins ago
1
1
You know - an open file is an iterator - therefore you can simply do
set_a = set(open("list_a.txt"))– jsbueno
48 mins ago
You know - an open file is an iterator - therefore you can simply do
set_a = set(open("list_a.txt"))– jsbueno
48 mins ago
4
4
yes but doing
set(f) in with block ensures that it closes the file– Jean-François Fabre
45 mins ago
yes but doing
set(f) in with block ensures that it closes the file– Jean-François Fabre
45 mins ago
add a comment |
To extend the comment of @L3viathan
If order of element is not important set is the rigth way.
here a dummy example you can adapt:
l1 = [0,1,2,3,4,5]
l2 = [3,4,5]
setL1 = set(l1) # transform the list into a set
setL2 = set(l2)
setDiff = setl1 - setl2 # make the difference
listeDiff = list(setDiff) # if you want to have your element back in a list
as you see is pretty straightforward in python.
add a comment |
To extend the comment of @L3viathan
If order of element is not important set is the rigth way.
here a dummy example you can adapt:
l1 = [0,1,2,3,4,5]
l2 = [3,4,5]
setL1 = set(l1) # transform the list into a set
setL2 = set(l2)
setDiff = setl1 - setl2 # make the difference
listeDiff = list(setDiff) # if you want to have your element back in a list
as you see is pretty straightforward in python.
add a comment |
To extend the comment of @L3viathan
If order of element is not important set is the rigth way.
here a dummy example you can adapt:
l1 = [0,1,2,3,4,5]
l2 = [3,4,5]
setL1 = set(l1) # transform the list into a set
setL2 = set(l2)
setDiff = setl1 - setl2 # make the difference
listeDiff = list(setDiff) # if you want to have your element back in a list
as you see is pretty straightforward in python.
To extend the comment of @L3viathan
If order of element is not important set is the rigth way.
here a dummy example you can adapt:
l1 = [0,1,2,3,4,5]
l2 = [3,4,5]
setL1 = set(l1) # transform the list into a set
setL2 = set(l2)
setDiff = setl1 - setl2 # make the difference
listeDiff = list(setDiff) # if you want to have your element back in a list
as you see is pretty straightforward in python.
answered 51 mins ago
RomainL.RomainL.
308313
308313
add a comment |
add a comment |
In case order matters you can presort the lists together with item indices and then iterate over them together:
list_2 = sorted(list_2)
diff_idx =
j = 0
for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
if x != list_2[j]:
diff_idx.append(i)
else:
j += 1
diff = [list_1[i] for i in sorted(diff_idx)]
This has time complexity of the sorting algorithm, i.e. O(n*log n).
add a comment |
In case order matters you can presort the lists together with item indices and then iterate over them together:
list_2 = sorted(list_2)
diff_idx =
j = 0
for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
if x != list_2[j]:
diff_idx.append(i)
else:
j += 1
diff = [list_1[i] for i in sorted(diff_idx)]
This has time complexity of the sorting algorithm, i.e. O(n*log n).
add a comment |
In case order matters you can presort the lists together with item indices and then iterate over them together:
list_2 = sorted(list_2)
diff_idx =
j = 0
for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
if x != list_2[j]:
diff_idx.append(i)
else:
j += 1
diff = [list_1[i] for i in sorted(diff_idx)]
This has time complexity of the sorting algorithm, i.e. O(n*log n).
In case order matters you can presort the lists together with item indices and then iterate over them together:
list_2 = sorted(list_2)
diff_idx =
j = 0
for i, x in sorted(enumerate(list_1), key=lambda x: x[1]):
if x != list_2[j]:
diff_idx.append(i)
else:
j += 1
diff = [list_1[i] for i in sorted(diff_idx)]
This has time complexity of the sorting algorithm, i.e. O(n*log n).
edited 31 mins ago
answered 38 mins ago
a_guesta_guest
5,47321241
5,47321241
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54128876%2fhow-is-the-most-efficient-way-to-subtract-two-lists%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
4
Does the order matter? If not, try using sets. With sets, subtraction should be linear:
set_c = set_a - set_b.– L3viathan
58 mins ago
But is possible make this in python?
– Vinicius Morais
56 mins ago
1
Yes, I mean the Python datatype
set.– L3viathan
55 mins ago
3
Possible duplicate of Subtracting 2 lists in Python
– tripleee
53 mins ago
1
Then possible duplicate of stackoverflow.com/questions/14807612/…
– tripleee
43 mins ago